Closed barryrowlingson closed 7 years ago
I see
$ ogrinfo -al emptymultipoly.shp
FAILURE:
Unable to open datasource `emptymultipoly.shp' with the following drivers.
and
> s = st_read("emptymultipoly.shp", "emptymultipoly")
Cannot open data source /tmp/emptymultipoly.shp
Error in CPL_read_ogr(dsn, layer, as.character(options), quiet, iGeomField - :
Open failed.
which is, at least, consistent with eachother.
> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.0, proj.4 4.9.2
and sf
from github. @mdsumner didn't you try something similar?
My GDAL that reads emptymultipoly.shp
via ogrinfo okay is:
$ ogrinfo --version
GDAL 2.1.0, released 2016/04/25
and the sf version is:
> require(sf)
Loading required package: sf
Linking to GEOS 3.5.0, GDAL 2.1.0, proj.4 4.8.0
> packageVersion("sf")
[1] ‘0.2.9’
With R 3.3.0 on Linux Mint 17 Qiana - Ubuntu Trusty, kernel 3.13.0-24.
Curiouser and curiouser...
On empty geometries, from gdal autotests:
# Test empty multipoint, multiline, multipolygon.
# From GDAL 1.6.0, the expected behaviour is to return a feature with a NULL geometry
I previously set this up but didn't go far with it:
https://github.com/mdsumner/sf.autotest
It was triggered here: https://github.com/edzer/sfr/issues/105
sf.autotest could be useful for getting early indications of possible problems, but also could be used to make a nice summary of what is in the autotest suite.
It would be fun to scan all CRAN packages too to make a map of the "spatialized ecosystem" out there.
I had also started on code to download all CRAN packages, inspired by crandatapkgs and search for Spatial classes but got stuck at a few points. I'd love to build a big online report to browse the available data sets in CRAN, and have basic reports with maps and tables via leaflet and ogr/gdalinfo summary outputs . . . this is obviously where R is going in the "do everything well" department.
@edzer are you sure you are in the right working directory? Those errors look like "file not found" errors...
This is what I see
b1e3465b
library(sf)
#Linking to GEOS 3.5.1, GDAL 2.1.2, proj.4 4.9.3
st_read(f)
Reading layer `emptymultipoly' from data source `/perm_storage/home/mdsumner/gdal_coverage/autotest/ogr/data/emptymultipoly.shp' using driver `ESRI Shapefile'
Error in CPL_read_ogr(dsn, layer, as.character(options), quiet, iGeomField - :
NULL pointer in to_multi_what: invalid input?
system(sprintf("ogrinfo -al %s", f))
INFO: Open of `/perm_storage/home/mdsumner/gdal_coverage/autotest/ogr/data/emptymultipoly.shp'
using driver `ESRI Shapefile' successful.
Layer name: emptymultipoly
Geometry: Polygon
Feature Count: 1
Extent: (100.000000, 0.000000) - (103.000000, 3.000000)
Layer SRS WKT:
(unknown)
FID: Integer64 (11.0)
NAME: String (60.0)
OGRFeature(emptymultipoly):0
FID (Integer64) = 2
NAME (String) = MultiPoly 1
devtools::session_info()
Session info ------------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, linux-gnu
ui RStudio (1.0.136)
language (EN)
collate en_AU.UTF-8
tz Australia/Hobart
date 2017-01-10
Packages ----------------------------------------------------------------------------
package * version date source
DBI 0.5-1 2016-09-10 cran (@0.5-1)
devtools 1.12.0.9000 2016-12-09 Github (hadley/devtools@1ce84b0)
digest 0.6.11 2017-01-03 cran (@0.6.11)
memoise 1.0.0.9001 2016-12-29 Github (hadley/memoise@884d565)
pkgbuild 0.0.0.9000 2016-11-18 Github (r-pkgs/pkgbuild@65eace0)
pkgload 0.0.0.9000 2016-11-11 Github (r-pkgs/pkgload@def2b10)
Rcpp 0.12.8.4 2017-01-08 Github (RcppCore/Rcpp@8ac38d7)
sf * 0.2-9 2017-01-10 local
udunits2 0.13 2016-11-17 cran (@0.13)
units 0.4-1 2016-12-09 cran (@0.4-1)
withr 1.0.2 2016-06-20 CRAN (R 3.3.0)
@barryrowlingson
> file.exists("emptymultipoly.shp", "emptymultipoly.shx", "emptymultipoly.dbf")
[1] TRUE TRUE TRUE
@edzer Edzer, sorry I accused you of making such a trivial error! But weird... I assume you can read the non-pathological shapefiles in the test folder?
Yesterday I created a docker container based on rocker/ropensci and built gdal 2.2.0dev out of github, it could read the empty*.shp files no problem with ogrinfo. Then I compiled sf in that container and got the same behaviour as I initially described. If my docker-fu was stronger I'd probably be able to push the container back to docker hub for everyone to try, but it was fairly easy to do.
Edzer's errors remain a mystery. Its not a GDAL without shapefile drivers is it?
OK, I get this now too - downloaded the wrong files (pretty convoluted pages, at osgeo).
Things look much worse (segfault) if you
st_read("emptymultipoly.shp", promote_to_multi=FALSE)
I now (https://github.com/edzer/sfr/commit/78f7e8d6d773420eccd20b0e887d0c9bd32a55e0) catch the NULL pointer earlier, but have no idea how to deal with it. On a null pointer p, you can't ask whether it is p->IsEmpty().
git clone on the GDAL autotest repo is straight forward, I will try to resurrect sf.autotest, constant regression testing would be great and Even has shown how to cover the variety of GDAL builds.
This is probably a r-hub worthy task, doing it properly is out of my league
> readOGR("emptymultipoly.shp", "emptymultipoly")
Error in readOGR("emptymultipoly.shp", "emptymultipoly") :
no features found
In addition: Warning message:
In ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
ogrInfo: all features NULL
See SEXP ogrFIDs
from line 276 in rgdal/src/ogrsource.cpp. It steps through all the features if it does not trust poLayer->GetFeatureCount()
. It looks as though the test doesn't tackle > 2G numbers of features.
In sf
, I'm trying to stick with the simple feature standard. Except for points, all geometries can be empty, because they start with a number home many things are coming, and this number can be zero. sf
supports those, e.g. by
> st_polygon(dim = "XY")
POLYGON()
and gdal lets you test whether a geometry is empty. What we see now is that a geometry, when read in gdal, returns not a (testable) empty geometry, but a NULL
pointer. Although one could argue that this must be an empty geometry, I'm inclined to put that burden to gdal and just reject it in sf
, as we do now. After all, there's no way to ask a NULL pointer what its dimensions are, or its geometry type.
Would it be possible to still return the non-spatial attributes and mock an empty geometry column? I have no experience with GDAL bindings, but I see they test feat
and feat.GetGeometryRef()
:
if feat is None:
return 'fail'
if feat.GetGeometryRef() is not None:
return 'fail'
Which (I'm guessing) means other attributes are available?
I'd like to see this too, it's been discussed before.
@edzer clarified that sf is about simple features rather than being about GDAL, so to me that means the GDAL part belongs in another package. I think this is out of sf scope, though ideally a common GDAL package would be good. Currently sf is carrying the burden for GDAL, where really it should be outside - but we can't expect Edzer to do that as well :) I've actually been thinking of putting this up on the RConsortium wishlist.
@etiennebr I didn't run this (did you?) but guess that feat is None, so the second statement is not reached. You can always try to read the .dbf with attributes using foreign::read.dbf. @mdsumner you surprise me here, since you were one of the supporting authors of the sfr proposal, and I'm curious what your arguments are to make a convincing case. Do you also want to make a geos package, and a proj package?
I don't see it as contrary to the goals of the sfr proposal to want to have things modularized and more general. (Though, I realize it wouldn't look good to go to that wishlist, I wasn't thinking clearly there. )
A wonderful outcome of this project is seeing how concise the GDAL bindings are in modern form, and so replicating that for more general use of GDAL doesn't seem terribly difficult, for me it's just a matter of other priorities to get to it. (And it's a really long list that keeps growing, I wish I could focus only on sf much more!).
I just see that as out of scope of sfr given its goals, but it "would be nice" if sf
imported a more general external package for GDAL itself. There are many extras that GDAL has, generic table read and SQL-injection for many providers, network models, and the utilities are now available in the library etc. Does that clarify?
Presumably we can use the LinkingTo capabilities anyway, and perhaps it's not a good idea to separate them.
Personally for GEOS, no I can't see wanting to use that outside of sf. PROJ.4 yes, but proj4
is already perfectly fine.
@edzer, I didn't run it either, but I was supposing the test was passing; feat
was not None
(valid pointer), so feat.GetGeometry
returned None
. I haven't looked further, so I don't know where exactly the GDAL error got raised in sf
. Just thought I'd mention that GetFeature()
probably returns a valid pointer if it's useful. read.dbf
could work on shapefiles, but I'm guessing other formats could provide empty geometries.
I repeat: if GDAL returns a NULL pointer, we have no geometry. If it returns an empty geometry, it returns a geometry, which is not NULL, and can be queried for a type. I don't see any point in continuing this discussion, but if you do, please do so after a working PR that resolves this so I can see what you mean (or you can see what I mean).
Please see the retrieve_ABS_null=
argument in maptools::.Map2PolyDF() in maptools/R/SpatialPolys-methods.R. The Australian Bureau of Statistics stuffed non-locatable data into shapefile dbfs, hence the work-around there.
> readShapePoly("emptymultipoly.shp", retrieve_ABS_null=TRUE, delete_null_obj=TRUE)
FID NAME
1 2 MultiPoly 1
> foreign::read.dbf("emptymultipoly.dbf")
FID NAME
1 2 MultiPoly 1
Someone on StackOverflow was looking for shapefiles to test something (not R) and was pointed to a set of test shapefiles and other spatial data in the GDAL repository:
https://trac.osgeo.org/gdal/browser/trunk/autotest/ogr/data
So I downloaded that repository and tried
st_read
on them all. Here's my simple test code, run it from the folder with the shapefiles in:Note this is just a test for reading, it doesn't test if it reads correctly.
The ones that fail are:
The "buggy" ones are (I think) invalid shapefiles and produce a "GDAL Error 1 Corrupted shape file" on
st_read
and viaogrinfo
on the command line, so I think that's correct behaviour.st_read
copes with the "empty.shp" shapefile, but not the other empty ones, returning:but my ogrinfo seems happy, running with no errors:
So that might be a bug.
The wider question is whether it is worth implementing a fuller test suite against these and all the other spatial data files in the ogr test set? There's kml, gml, sqlite....
I'm not sure there's an argument for implementing all the tests in the GDAL OGR test suite, since you will probably be testing your GDAL more than your
sf
, but at least a set of shapefiles covering all geometries, dimensions (2D/3D), measurements (M and no-M), emptiness and pathologies might be useful.