r-spatial / sf

Simple Features for R
https://r-spatial.github.io/sf/
Other
1.31k stars 291 forks source link

Limits to load database to PostGIS through st_write() #2087

Closed bwcastillo closed 1 year ago

bwcastillo commented 1 year ago

Hello,

I am trying to load a database from NY Open Data -5.8M rows - to PostGIS, using st_write() method. I am having different responses when I define the limit -number of rows- parameter in the url/api. If I define =<3M or >3, in the first case (=3M) it works, but when I set up 4M it doesn't work and return Error: Cannot open "https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=4000000&%24%24app_token=MyToken"; The file doesn't seem to exist. GDAL Error 1: Failure writing output to destination, but if I copy and open the link in the browser it works.

sf::st_write(st_read("https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=5308876&%24%24app_token=MyToken),
             dsn= conn,
             layer="nypd_arrests_historic",
             delete_layer=T,
             append=F,
             driver="PostgreSQL/PostGIS")

sessionInfo()

R Under development (unstable) (2023-01-20 r83646 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.utf8    

time zone: America/Toronto
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.2     stringr_1.5.0     dplyr_1.0.10      purrr_1.0.1       readr_2.1.3      
 [6] tidyr_1.2.1       tibble_3.1.8      ggplot2_3.4.0     tidyverse_1.3.2   rpostgis_1.4.3   
[11] sf_1.0-9          RPostgreSQL_0.7-4 DBI_1.1.3         RPostgres_1.4.5  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0    blob_1.2.3          lazyeval_0.2.2      reprex_2.0.2        timechange_0.2.0   
 [6] lifecycle_1.0.3     geojsonio_0.10.0    ellipsis_0.3.2      terra_1.6-53        magrittr_2.0.3     
[11] compiler_4.3.0      rlang_1.0.6         tools_4.3.0         utf8_1.2.2          bit_4.0.5          
[16] sp_1.6-0            classInt_0.4-8      curl_5.0.0          xml2_1.3.3          KernSmooth_2.23-20 
[21] httpcode_0.3.0      foreign_0.8-84      withr_2.5.0         grid_4.3.0          googledrive_2.0.0  
[26] fansi_1.0.3         e1071_1.7-12        colorspace_2.0-3    scales_1.2.1        crul_1.3           
[31] cli_3.6.0           crayon_1.5.2        generics_0.1.3      rstudioapi_0.14     geojson_0.3.4      
[36] httr_1.4.4          modelr_0.1.10       tzdb_0.3.0          readxl_1.4.1        proxy_0.4-27       
[41] rvest_1.0.3         assertthat_0.2.1    s2_1.1.2            cellranger_1.1.0    vctrs_0.5.1        
[46] V8_4.2.2            jsonlite_1.8.4      geojsonsf_2.0.3     hms_1.1.2           bit64_4.0.5        
[51] maptools_1.1-6      units_0.8-1         glue_1.6.2          codetools_0.2-18    stringi_1.7.12     
[56] lubridate_1.9.0     gtable_0.3.1        raster_3.6-14       munsell_0.5.0       pillar_1.8.1       
[61] R6_2.5.1            jqr_1.2.3           dbplyr_2.3.0        wk_0.7.1            lattice_0.20-45    
[66] haven_2.5.1         backports_1.4.1     googlesheets4_1.0.1 broom_1.0.2         gargle_1.2.1       
[71] class_7.3-20.1      Rcpp_1.0.9          fs_1.5.2            rgeos_0.6-1         pkgconfig_2.0.3

sf::sf_extSoftVersion()

sf::sf_extSoftVersion()
GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
"3.9.4"        "3.6.0"        "9.1.1"         "true"         "true"        "9.1.1" 

What can be the reason?

Thanks in advance.

edzer commented 1 year ago

R package sf uses the GDAL library for this; to sort out whether the problem is with sf or with GDAL I would try to obtain the data with a call to the (command line tool) ogr2ogr, trying to save the dataset to a local file, like a GPKG or so.

bwcastillo commented 1 year ago

Thank you @edzer! I will try to reach out where the problem is coming from.

bwcastillo commented 1 year ago

@edzer,

Finally I verified that is a GDAL problem, I did my test in local and it worked:

ogr2ogr -f GPKG output.gpkg nypd-arrest-historic.geojson

But when I used the url where is the geojson file it didn't recognize it:

ogr2ogr -f PostgreSQL PG:"host=localhost dbname=censos user=postgres password=adminpass port=5432 schemas=censos ACTIVE_SCHEMA=censos" -lco SCHEMA=censos https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=5308876^&%24%24app_token=TOKENCODE -lco GEOMETRY_NAME=geometry
ERROR 1: Failed writing body (0 != 16384)
FAILURE:
Unable to open datasource `https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=5308876&%24%24app_token=TOKENCODE' with the following drivers.
  -> `JP2ECW'
  -> `OCI'
  -> `SOSI'
  -> `PCIDSK'
  -> `netCDF'
  -> `PDS4'
  -> `VICAR'
  -> `JP2OpenJPEG'
  -> `PDF'
  -> `MBTiles'
  -> `EEDA'
  -> `DB2ODBC'
  -> `ESRI Shapefile'
  -> `MapInfo File'
  -> `UK .NTF'
  -> `OGR_SDTS'
  -> `S57'
  -> `DGN'
  -> `OGR_VRT'
  -> `REC'
  -> `Memory'
  -> `BNA'
  -> `CSV'
  -> `NAS'
  -> `GML'
  -> `GPX'
  -> `LIBKML'
  -> `KML'
  -> `GeoJSON'
  -> `GeoJSONSeq'
  -> `ESRIJSON'
  -> `TopoJSON'
  -> `Interlis 1'
  -> `Interlis 2'
  -> `OGR_GMT'
  -> `GPKG'
  -> `SQLite'
  -> `ODBC'
  -> `WAsP'
  -> `PGeo'
  -> `MSSQLSpatial'
  -> `OGR_OGDI'
  -> `PostgreSQL'
  -> `MySQL'
  -> `OpenFileGDB'
  -> `XPlane'
  -> `DXF'
  -> `CAD'
  -> `FlatGeobuf'
  -> `Geoconcept'
  -> `GeoRSS'
  -> `GPSTrackMaker'
  -> `VFK'
  -> `PGDUMP'
  -> `OSM'
  -> `GPSBabel'
  -> `SUA'
  -> `OpenAir'
  -> `OGR_PDS'
  -> `WFS'
  -> `OAPIF'
  -> `HTF'
  -> `AeronavFAA'
  -> `Geomedia'
  -> `EDIGEO'
  -> `SVG'
  -> `CouchDB'
  -> `Cloudant'
  -> `Idrisi'
  -> `ARCGEN'
  -> `SEGUKOOA'
  -> `SEGY'
  -> `XLS'
  -> `ODS'
  -> `XLSX'
  -> `Elasticsearch'
  -> `Walk'
  -> `Carto'
  -> `AmigoCloud'
  -> `SXF'
  -> `Selafin'
  -> `JML'
  -> `PLSCENES'
  -> `CSW'
  -> `VDV'
  -> `GMLAS'
  -> `MVT'
  -> `NGW'
  -> `MapML'
  -> `TIGER'
  -> `AVCBin'
  -> `AVCE00'
  -> `HTTP'

If I copy the Unable to open datasource that the output refers, in a browser it works. So, it makes me to conclude that is GDAL problem.

I don't know if I can post this in any GDAL forum.

Thanks in advance.

PS: the subtle difference '^' between the url input and output is because treat '&' as special character in the input url.

edzer commented 1 year ago

Have you tried to put /vsicurl/ in front of the dataset starting with https://..., like /vsicurl/https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=5308876^&%24%24app_token=TOKENCODE ?

bwcastillo commented 1 year ago

Now, I tried but it didn't work, also I tried with ogrinfo and neither.

Same last query with vsicurl:

C:\>ogr2ogr -f PostgreSQL PG:"host=localhost dbname=censos user=postgres password=adminpass port=5432 schemas=censos ACTIVE_SCHEMA=censos" -lco SCHEMA=censos /vsicurl/https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=530887^&%24%24app_token=TOKENCODE -lco GEOMETRY_NAME=geometry
Warning 1: HTTP response code on https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=530887&%24%24app_token=TOKENCODE: 403
FAILURE:
Unable to open datasource `/vsicurl/https://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=530887&%24%24app_token=TOKENCODE' with the following drivers.
  -> `JP2ECW'
  -> `OCI'
  -> `SOSI'
  -> `PCIDSK'
  -> `netCDF'
  -> `PDS4'
  -> `VICAR'
  -> `JP2OpenJPEG'
  -> `PDF'
  -> `MBTiles'
  -> `EEDA'
  -> `DB2ODBC'
  -> `ESRI Shapefile'
  -> `MapInfo File'
  -> `UK .NTF'
  -> `OGR_SDTS'
  -> `S57'
  -> `DGN'
  -> `OGR_VRT'
  -> `REC'
  -> `Memory'
  -> `BNA'
  -> `CSV'
  -> `NAS'
  -> `GML'
  -> `GPX'
  -> `LIBKML'
  -> `KML'
  -> `GeoJSON'
  -> `GeoJSONSeq'
  -> `ESRIJSON'
  -> `TopoJSON'
  -> `Interlis 1'
  -> `Interlis 2'
  -> `OGR_GMT'
  -> `GPKG'
  -> `SQLite'
  -> `ODBC'
  -> `WAsP'
  -> `PGeo'
  -> `MSSQLSpatial'
  -> `OGR_OGDI'
  -> `PostgreSQL'
  -> `MySQL'
  -> `OpenFileGDB'
  -> `XPlane'
  -> `DXF'
  -> `CAD'
  -> `FlatGeobuf'
  -> `Geoconcept'
  -> `GeoRSS'
  -> `GPSTrackMaker'
  -> `VFK'
  -> `PGDUMP'
  -> `OSM'
  -> `GPSBabel'
  -> `SUA'
  -> `OpenAir'
  -> `OGR_PDS'
  -> `WFS'
  -> `OAPIF'
  -> `HTF'
  -> `AeronavFAA'
  -> `Geomedia'
  -> `EDIGEO'
  -> `SVG'
  -> `CouchDB'
  -> `Cloudant'
  -> `Idrisi'
  -> `ARCGEN'
  -> `SEGUKOOA'
  -> `SEGY'
  -> `XLS'
  -> `ODS'
  -> `XLSX'
  -> `Elasticsearch'
  -> `Walk'
  -> `Carto'
  -> `AmigoCloud'
  -> `SXF'
  -> `Selafin'
  -> `JML'
  -> `PLSCENES'
  -> `CSW'
  -> `VDV'
  -> `GMLAS'
  -> `MVT'
  -> `NGW'
  -> `MapML'
  -> `TIGER'
  -> `AVCBin'
  -> `AVCE00'
  -> `HTTP'

With ogrinfo

C:\>ogrinfo -ro -al -so /vsicurl/http://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=530887^&%24%24app_token=TOKENCODE
Warning 1: HTTP response code on http://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=530887&%24%24app_token=TOKENCODE: 403
FAILURE:
Unable to open datasource `/vsicurl/http://data.cityofnewyork.us/resource/8h9b-rp9u.geojson?%24limit=530887&%24%24app_token=TOKENCODE' with the following drivers.
  -> JP2ECW
  -> OCI
  -> SOSI
  -> PCIDSK
  -> netCDF
  -> PDS4
  -> VICAR
  -> JP2OpenJPEG
  -> PDF
  -> MBTiles
  -> EEDA
  -> DB2ODBC
  -> ESRI Shapefile
  -> MapInfo File
  -> UK .NTF
  -> OGR_SDTS
  -> S57
  -> DGN
  -> OGR_VRT
  -> REC
  -> Memory
  -> BNA
  -> CSV
  -> NAS
  -> GML
  -> GPX
  -> LIBKML
  -> KML
  -> GeoJSON
  -> GeoJSONSeq
  -> ESRIJSON
  -> TopoJSON
  -> Interlis 1
  -> Interlis 2
  -> OGR_GMT
  -> GPKG
  -> SQLite
  -> ODBC
  -> WAsP
  -> PGeo
  -> MSSQLSpatial
  -> OGR_OGDI
  -> PostgreSQL
  -> MySQL
  -> OpenFileGDB
  -> XPlane
  -> DXF
  -> CAD
  -> FlatGeobuf
  -> Geoconcept
  -> GeoRSS
  -> GPSTrackMaker
  -> VFK
  -> PGDUMP
  -> OSM
  -> GPSBabel
  -> SUA
  -> OpenAir
  -> OGR_PDS
  -> WFS
  -> OAPIF
  -> HTF
  -> AeronavFAA
  -> Geomedia
  -> EDIGEO
  -> SVG
  -> CouchDB
  -> Cloudant
  -> Idrisi
  -> ARCGEN
  -> SEGUKOOA
  -> SEGY
  -> XLS
  -> ODS
  -> XLSX
  -> Elasticsearch
  -> Walk
  -> Carto
  -> AmigoCloud
  -> SXF
  -> Selafin
  -> JML
  -> PLSCENES
  -> CSW
  -> VDV
  -> GMLAS
  -> MVT
  -> NGW
  -> MapML
  -> TIGER
  -> AVCBin
  -> AVCE00
  -> HTTP
edzer commented 1 year ago

OK, now your task is to sort out whether this is supposed to work the way you do it, or whether this is a feature request, really, and find the appropriate place to post it. I think we can close here.

bwcastillo commented 1 year ago

@edzer I was wondering how sf::st_read works, because it allows me to read 3M from an api/url but not 4M, also with ogr2ogr I couldn't get any data.

Thank you !