Closed yeesian closed 2 years ago
Nice work here! I wasn't aware of unset vs null in GDAL before. I believe null is much more common in practice?
I made this GeoJSONSeq file with both unset and null values:
{ "type": "Feature", "properties": { "a": 1, "b": null }, "geometry": { "type": "Point", "coordinates": [ 1.0, 1.0 ] } }
{ "type": "Feature", "properties": {}, "geometry": { "type": "Point", "coordinates": [ 2.0, 2.0 ] } }
If I run this through ogr with ogr2ogr -f GeoJSONSeq unset-gdal-translate.geojson unset.geojson
, the result is the same, e.g. unset and null are both kept. If I import the file in QGIS however, it shows this:
And indeed if I export it from QGIS I get nulls back in field that were unset before:
{ "type": "Feature", "properties": { "a": 1, "b": null }, "geometry": { "type": "Point", "coordinates": [ 1.0, 1.0 ] } }
{ "type": "Feature", "properties": { "a": null, "b": null }, "geometry": { "type": "Point", "coordinates": [ 2.0, 2.0 ] } }
Probably it makes more sense to keep the distinction here just like ogr2ogr
does, this being a GDAL wrapper. Though I do understand @evetion's point about making everything missing possibly being more user friendly.
@visr This is all about displaying a non-tabular (sparse) dataset like your geojson in a tabular interface. In this case, QGIS doesn't have a representation of unset, just NULL. Julia can be more expressive, although I wonder whether our Tables interface likes the mixing of missing
and nothing
.
Yeah exactly. I can imagine it being less ideal in the Tables interface. Though these functions are also used outside of that, in the sparse setting, where it does make sense to keep the distinction.
I think the interface can technically handle both just fine, it is just not very conventional. Though I don't know how common unset is in practice, and it's relatively easy to convert nothing
to missing
, if it causes problems.
@evetion and @visr, in the draft PR #243 , I'm about to try to handle it for a conversion from a table source to an IFeatureLayer
.
I was just waiting for this PR #238 to be merged.
In the other way, since we rely on Tables.jl/src/fallbacks.jl buildcolumns(::Nothing, rowitr::T) where {T}
which iterates through all features building column types by "widening" them with promote_type
each time it encounters a new type in field values, it should work as follow if we have an FieldDefn
of OGRFieldType
OFTInteger64
and features with UNSET or NULL fields for this FieldDefn
:
julia> T = Int64; @show T
T = promote_type(T, typeof(nothing)); @show T
T = promote_type(T, typeof(missing)); @show T
T = Int64
T = Union{Nothing, Int64}
T = Union{Missing, Nothing, Int64}
@mathieu17g thanks for showing that indeed technically it works (#243 is a great addition by the way).
I think for the tables interface specifically, the concerns about nothing
are:
missing
and not nothing
, potentially leading to confusion or errors if our tables are used in functions that only expect missing
or data.Union{Missing, Nothing, Int64}
may not enable small union optimizations that Union{Missing, Int64}
will have (would need to be checked).Hence for tables I'm less enthousiastic about nothing
than for just getfield
. Though I'm not sure what's best either. If we go with nothing
at worst we force people to explicitly convert to missing
, which is not too bad of a price to pay to make the data perfectly round trippable and consistent with GDAL itself. Though if you know a way to optionally do the nothing
to missing
conversion during table construction instead of afterwards, potentially avoiding an expensive copy, it would be interesting as well.
@evetion @visr is this PR good to go from your perspective? I'm okay with iterating further on feedback or for you to have more time to review if you'd like -- else I'll merge it to keep things moving for Mathieu per https://github.com/yeesian/ArchGDAL.jl/pull/238#issuecomment-927720530
Hence for tables I'm less enthousiastic about
nothing
than for justgetfield
. Though I'm not sure what's best either. If we go withnothing
at worst we force people to explicitly convert tomissing
, which is not too bad of a price to pay to make the data perfectly round trippable and consistent with GDAL itself.
Yes, and if we need to convert all nothing
values to missing
later, it won't be difficult
Though if you know a way to optionally do the
nothing
tomissing
conversion during table construction instead of afterwards, potentially avoiding an expensive copy, it would be interesting as well.
This should be easily done in getcolumn
function which is called for each field and geometry value by Tables.jl/src/fallback.jl Tables.columns
function
For the Tables.rows
, it can be done by implementing a slightly modified version of Tables.rows
fallback function
function Tables.rows(x::AbstractFeatureLayer)
cols = Tables.columns(x)
return Tables.RowIterator(cols, Int(Tables.rowcount(cols)))
end
potentially avoiding an expensive copy
I didn't get what you have in mind. Tables.columns
fallback constructor always make a copy.
I had not considered the option of not making a copy. Maybe it would allow to modify a dataset from a table (e.g. DataFrame). Brilliant idea!
It would probably require to convert field values in the table to something like Field{OGRFieldType, OGRFieldSubtype}
and fully map GDAL objects interdependence in ArchGDAL composite types
@mathieu17g we can discuss copies and such in a different issue. I still wanted to create one, because I saw that converting a layer to DataFrame for a large vector file took minutes.
@mathieu17g we can discuss copies and such in a different issue. I still wanted to create one, because I saw that converting a layer to DataFrame for a large vector file took minutes.
I'm interested in this too! Feel free to open an issue for it or have the discussion in #243
I'll be merging this PR after #245 is merged if I don't see any further objections by then.
Seems like #245 is turning out to be more complicated than expected, and we're still figuring it out in https://github.com/JuliaGeo/GDAL.jl/pull/124.
In the meantime, there are no remaining issues to resolve here, so I'll merge this PR to unblock #243.
Fixes #177 .
I think this aligns the behavior a lot better across the different file formats (see the test sets in test/test_tables.jl)
This behavior is implemented through
getfield(feature, i)
which makes use ofgetfieldtype(feature, i)
. That way, we are aligned in behavior for field subtypes even for e.g. displaying the fields.Because a lot hinges on distinguishing between whether a field is set versus whether it is null (see https://github.com/Toblerity/Fiona/issues/460#issue-240013079), I have also added support for
isfieldnull()
,isfieldsetandnotnull()
, andsetfieldnull!()
.