Open elijaflores6 opened 6 years ago
@Elija How big is your team? Nobody with a desktop PC (→no overheating)?
I did the subset in the Texas dataset using a MacBook Pro, no further problems. Took me about 5 minutes. Just had to learn a little better about the SF package. Using RGEOS/RGDAL was very intensive computing that hang the computer for about 6h to Maine, and endless for Texas. On the readme file there is a nice walkthrough how to do it in R/sf package. It is pretty simple. Let me know if you need further help. I appreciate MS is using a open source format (geoJSON). However a shapefile format would be an alternative for most of the folks.
Hey there @antifa-ev
right now, we are a team of 3 people (myself included), one of them has a developer computer and he'll be working on that. But we would like to split up the work amongst ourselves.
We all also work remotely in different states, so we were only provided laptops for work. We're hoping to find a better solution with the resources we have at the moment.
Look at using safe software, FME. Took me 1 hour 26 minutes to convert California from GeoJson to File GDB. FME runs on Mac, too
The following script worked for me. Took ~10 min in a MacBook Pro 2017. After that, any intersection works pretty fast.
library(sf)
setwd("./USBuildingFootprints/")
Texas_buildings <- st_read("./Texas.geojson")
st_write(Texas_buildings, "Texas_buildings.shp")
@elijaflores6 Have either of the two suggestions worked for you?
30mins to complete Texas.
I used: ogr2ogr -nlt POLYGON dest.shp source.geojson on my mac and it worked for me. Thanks!
Re large geojson's, I'm able to download and open others but not California (ie, I opened NY and DC, and NY is one of the largest). Is the California file corrupted? If not, can it be broken into N and S California or some other partition, 3 or 4 parts, to make processing easier? Thanks
The following script worked for me. Took ~10 min in a MacBook Pro 2017. After that, any intersection works pretty fast.
library(sf) setwd("./USBuildingFootprints/") Texas_buildings <- st_read("./Texas.geojson") st_write(Texas_buildings, "Texas_buildings.shp")
Hey @abuabara , thanks for sharing the R script. Trying to replicate it gives an error on the Texas dataset: "Error in CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : Open failed. In addition: Warning messages: 1: In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : GDAL Error 1: JSON parsing error: buffer size overflow (at offset 0) 2: In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : GDAL Error 4: Failed to read GeoJSON data" Running on a Dell XPS13 8gb i5-5th gen processor.
Your help would be appreciated.
The following script worked for me. Took ~10 min in a MacBook Pro 2017. After that, any intersection works pretty fast.
library(sf) setwd("./USBuildingFootprints/") Texas_buildings <- st_read("./Texas.geojson") st_write(Texas_buildings, "Texas_buildings.shp")
Hey @abuabara , thanks for sharing the R script. Trying to replicate it gives an error on the Texas dataset: "Error in CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : Open failed. In addition: Warning messages: 1: In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : GDAL Error 1: JSON parsing error: buffer size overflow (at offset 0) 2: In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : GDAL Error 4: Failed to read GeoJSON data" Running on a Dell XPS13 8gb i5-5th gen processor.
Your help would be appreciated.
I'm trying to replicate your error here. So far I have two guesses ...
1) memory available. I can test on a 16GB MacBook Pro and a 32GB iMac. On the iMac it's substantially faster and it uses most of the memory. On the MBP it works, but reaches full memory pressure. Probably 8GB is not enough.
2) what about the GDAL installation. After loading sf package I have: "Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3"
It seems that the problem with geojson is that commonly the software tries to read the whole thing. However, I have written a small script to move from json to an ascii format which, say Global Mapper, reads easily.
"""
Prints out ascii format from the building footprint files.
usage:
python get_ascii_from_json.py json.geojson > out.file
output in form <feature number>,<x>,<y>:
0,-120.800643,46.963025
0,-120.800727,46.96307
0,-120.800825,46.962986
0,-120.800741,46.96294
0,-120.800643,46.963025
1,-120.686818,47.038457
...
"""
import sys
c = 0
maxc = -1;
for i in file(sys.argv[1]):
if not "Poly" in i:
continue
if not '[[[' in i or not ']]]' in i:
raise Exception("assumptions on format are incorrect")
if maxc > 0: print i
r = i.split('[[[')[1]
arr = r.split(']]]')[0]
arr = "".join(("".join(arr.split('['))).split(']'))
pp = arr.split(",")
if maxc > 0: print pp
prefix = str(c)
j = 0
while j < len(pp) / 2:
print prefix + "," + pp[j*2]+"," + pp[j*2+1]
j+=1;
c = c+1;
if (c == maxc):
break;
@elijaflores6 I transformed the geojson into shapefiles and saved one indexed shapefile per county in an s3 bucket, here: s3://glr-ds-us-building-footprints
. Files are named {county_fips}.{extension}
, eg. 19075.shx
. A 'full' shapefile consists of five files, with extensions: .dbf
, .prj
, .qix
, .shp
, .shx
.
I created a repo for the derived shapefiles, with an example of downloading a shapefile, querying it, and computing some building attributes of plots of land in the county. Check it out! https://github.com/granularag/USBuildingFootprints
The following script worked for me. Took ~10 min in a MacBook Pro 2017. After that, any intersection works pretty fast.
library(sf) setwd("./USBuildingFootprints/") Texas_buildings <- st_read("./Texas.geojson") st_write(Texas_buildings, "Texas_buildings.shp")
Hey @abuabara , thanks for sharing the R script. Trying to replicate it gives an error on the Texas dataset: "Error in CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : Open failed. In addition: Warning messages: 1: In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : GDAL Error 1: JSON parsing error: buffer size overflow (at offset 0) 2: In CPL_read_ogr(dsn, layer, as.character(options), quiet, type, : GDAL Error 4: Failed to read GeoJSON data" Running on a Dell XPS13 8gb i5-5th gen processor. Your help would be appreciated.
I'm trying to replicate your error here. So far I have two guesses ...
- memory available. I can test on a 16GB MacBook Pro and a 32GB iMac. On the iMac it's substantially faster and it uses most of the memory. On the MBP it works, but reaches full memory pressure. Probably 8GB is not enough.
- what about the GDAL installation. After loading sf package I have: "Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3"
I am having the same issue. I have tried to open up the Michigan geojson with QGIS and ArcPro. It crashes both my laptop and my desktop each time I try. When I tried the R script, it froze the windows desktop computer I was using and crashed several programs before erring 2 hrs later with pretty much the same error:
> library(sf)
Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
> setwd("C:/UPX/DATA/MI/Data/Polygons/Michigan")
> Buildings <- st_read("Michigan.geojson")
Cannot open data source C:\UPX\DATA\MI\Data\Polygons\Michigan\Michigan.geojson
Error in CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
Open failed.
In addition: Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet, :
GDAL Error 4: Failed to read GeoJSON data
My laptop and desktop both have 8GB RAM and run 64-bit Windows 10. I work at a university research lab and this is all we have access to. @ledusledus Any ideas?
@jwhyb - did you try pushing the thing through the script above and then opening it up in qgis?
@ledusledus It did work! I had to upload as csv points and then use the Points2One extension, grouping by the first field, to turn into polygons. Thanks!
The FME software method works great for Texas and California. I found that R can handle everything else. I downloaded all the rest of the states and unzipped them into an empty folder then used the following code to convert all of them to shapefiles with a loop in R:
library(geojsonsf) library(sf)
files <- list.files(path="Folder/Path/Building_jsons", pattern="*.geojson", full.names=TRUE, recursive=FALSE)
for(f in files){ split <- read.table(text = f, sep = "/") filename <- split[,8] state <- substr(filename,1,nchar(as.character(filename))-8) outfile <- paste0("Folder/Path/Building_shps/",state,".shp") sf <- geojsonsf::geojson_sf(f) st_write(sf, outfile) print(paste0("Completed ",state)) }
I'm having the same problem with California, except I am limited by hardware. Loading the state of California's footprints successfully crashes my computer when opened or read through FME, R, Python, cmd, and notepad++. Conversion to other filetypes has been equally unsuccessful. At this point my goal is to subset the state into smaller bitesize portions that my machine can handle.
Does anyone have a subset, or even converted format of the CA data they would be willing to share?
@aquaraider333 I've got county-level shapefiles on s3. https://github.com/granularag/USBuildingFootprints
@aquaraider333, I had the same problem which I solved building tilesets using tippecanoe (https://github.com/mapbox/tippecanoe):
tippecanoe -o out.mbtiles -zg --drop-densest-as-needed California.geojson -P
worked like a charm.
I can share the tileset if you are interested
Matteo -
Thank you for your email. I haven’t had a chance yet to study your github site (looks quite interesting, and will study it) but the California geojson file is impossible. If you could split it into say three parts (N, Central and South), or tilesets, whatever, and provide access, it would be fantastic. In any event, I would like access to the tilesets you mention.
With Best Regards, and thanks
Charles Scawthorn
From: Matteo Marcantonio notifications@github.com Sent: Monday, December 3, 2018 3:28 PM To: Microsoft/USBuildingFootprints USBuildingFootprints@noreply.github.com Cc: cscawthorn cscawthorn@gmail.com; Comment comment@noreply.github.com Subject: Re: [Microsoft/USBuildingFootprints] geoJSON state datasets too large (#35)
@aquaraider333 https://github.com/aquaraider333 , I had the same problem which I solved building tilesets using tippecanoe (https://github.com/mapbox/tippecanoe):
tippecanoe -o out.mbtiles -zg --drop-densest-as-needed California.geojson -P
worked like a charm.
I can share the tileset if you are interested
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Microsoft/USBuildingFootprints/issues/35#issuecomment-443910292 , or mute the thread https://github.com/notifications/unsubscribe-auth/AII4sbbb0YXO4wQeNhz-tkOG95RBnmkQks5u1bOKgaJpZM4Vjzvx . https://github.com/notifications/beacon/AII4sSMyfQCA-AslOBdvHbQdExaHEMyQks5u1bOKgaJpZM4Vjzvx.gif
I've got county-level shapefiles on s3. https://github.com/granularag/USBuildingFootprints
@chipfranzen I'm having trouble getting Fiona working properly on my windows build, and it looks like theres a paywall to download large amounts of data through S3. Is there an alternative way I might access the data?
I can share the tileset if you are interested
@mattmar If you are willing to share, I'd be happy to use them.
@aquaraider333 yeah, you will have to pay for data transfer from s3. It's insanely cheap, like $0.0007 per GB. https://aws.amazon.com/s3/pricing/
Chip – please pardon my ignorance, but when I click on your link https://github.com....I am taken to your github page which says
I am not familiar with s3 – I guess it is refers to AWS buckets. The link you provide is not clickable – when I enter it as a URL, with our without https://, it just takes me to a google search result. I have googled “accessing s3” which doesn’t help.
How do I get to those county level files? Really appreciate your help, and just need to go the last mile.
With best regards
Charles Scawthorn
From: Chip Franzen notifications@github.com Sent: Monday, December 3, 2018 3:08 PM To: Microsoft/USBuildingFootprints USBuildingFootprints@noreply.github.com Cc: cscawthorn cscawthorn@gmail.com; Comment comment@noreply.github.com Subject: Re: [Microsoft/USBuildingFootprints] geoJSON state datasets too large (#35)
@aquaraider333 https://github.com/aquaraider333 I've got county-level shapefiles on s3. https://github.com/granularag/USBuildingFootprints
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Microsoft/USBuildingFootprints/issues/35#issuecomment-443905534 , or mute the thread https://github.com/notifications/unsubscribe-auth/AII4sQZ1ITjW0tIStZMlyTgV60Lnl4bVks5u1a7tgaJpZM4Vjzvx . https://github.com/notifications/beacon/AII4sa1Sau33m7MzB3crgHmadxmZUC3Bks5u1a7tgaJpZM4Vjzvx.gif
I used PostGIS to transform the geojson files to county CSV files with the geometry as WKT. These files compress really nicely compared to geojson and shapefiles and the zipfiles can be opened directly in QGIS. It took a few days to figure out the process and run it for the entire dataset (all 50 states plus DC). My workflow is documented here: https://github.com/dlab-geo/msfootprints_by_county. You can see the output of this processing for CA in this google drive folder. Hope that helps. - Patty
Patty -
Helps ENORMOUSLY. Thank you. WKTs easily open in QGIS.
What is the naming nomenclature? Eg ca_06109 appears to be Tuolumne, but California only has 58 counties (and you created 58 files) so how does 109 correspond to Tuolumne?
With best regards, much thanks, and best wishes for the holidays and the New Year
Charles
From: Patty Frontiera notifications@github.com Sent: Thursday, December 20, 2018 7:58 PM To: Microsoft/USBuildingFootprints USBuildingFootprints@noreply.github.com Cc: cscawthorn cscawthorn@gmail.com; Comment comment@noreply.github.com Subject: Re: [Microsoft/USBuildingFootprints] geoJSON state datasets too large (#35)
I used PostGIS to transform the geojson files to county CSV files with the geometry as WKT. These files compress really nicely compared to geojson and shapefiles and the zipfiles can be opened directly in QGIS. It took a few days to figure out the process and run it for the entire dataset (all 50 states plus DC). My workflow is documented here: https://github.com/dlab-geo/uscounty_footprints. You can see the output of this processing for CA in this google drive folder https://drive.google.com/open?id=1-XGvS25tQKKQ3HTqWjAfLJ4PaeXJ9yyY . Hope that helps. - Patty
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Microsoft/USBuildingFootprints/issues/35#issuecomment-449236391 , or mute the thread https://github.com/notifications/unsubscribe-auth/AII4saDKZAreCxxwPC-kmNKOHJoIXPvvks5u7FxkgaJpZM4Vjzvx . https://github.com/notifications/beacon/AII4sb4vmYa-YLajgfax2ssl2yqaPc8dks5u7FxkgaJpZM4Vjzvx.gif
Hi Charles I’m glad you found the work helpful. The first two digits of the number are the state FIPS code (06 for CA) and the next three are the county FIPS codes.
Sent from my iPhone
On Dec 20, 2018, at 11:04 PM, cscawthorn notifications@github.com wrote:
Patty -
Helps ENORMOUSLY. Thank you. WKTs easily open in QGIS.
What is the naming nomenclature? Eg ca_06109 appears to be Tuolumne, but California only has 58 counties (and you created 58 files) so how does 109 correspond to Tuolumne?
With best regards, much thanks, and best wishes for the holidays and the New Year
Charles
From: Patty Frontiera notifications@github.com Sent: Thursday, December 20, 2018 7:58 PM To: Microsoft/USBuildingFootprints USBuildingFootprints@noreply.github.com Cc: cscawthorn cscawthorn@gmail.com; Comment comment@noreply.github.com Subject: Re: [Microsoft/USBuildingFootprints] geoJSON state datasets too large (#35)
I used PostGIS to transform the geojson files to county CSV files with the geometry as WKT. These files compress really nicely compared to geojson and shapefiles and the zipfiles can be opened directly in QGIS. It took a few days to figure out the process and run it for the entire dataset (all 50 states plus DC). My workflow is documented here: https://github.com/dlab-geo/uscounty_footprints. You can see the output of this processing for CA in this google drive folder https://drive.google.com/open?id=1-XGvS25tQKKQ3HTqWjAfLJ4PaeXJ9yyY . Hope that helps. - Patty
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Microsoft/USBuildingFootprints/issues/35#issuecomment-449236391 , or mute the thread https://github.com/notifications/unsubscribe-auth/AII4saDKZAreCxxwPC-kmNKOHJoIXPvvks5u7FxkgaJpZM4Vjzvx . https://github.com/notifications/beacon/AII4sb4vmYa-YLajgfax2ssl2yqaPc8dks5u7FxkgaJpZM4Vjzvx.gif
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
@elijaflores6 I transformed the geojson into shapefiles and saved one indexed shapefile per county in an s3 bucket, here:
s3://glr-ds-us-building-footprints
. Files are named{county_fips}.{extension}
, eg.19075.shx
. A 'full' shapefile consists of five files, with extensions:.dbf
,.prj
,.qix
,.shp
,.shx
.
Hi I am no developer or programmer but I do need California building shapefiles by county. I don't know what s3 bucket is. How can I get them? (if okay for you).
@aquaraider333 I've got county-level shapefiles on s3. https://github.com/granularag/USBuildingFootprints
I actually signed in for aws s3 bucket but somehow I cannot not connect to your link. s3://glr-ds-us-building-footprints.
Is there a chance you could put the files somewhere else?
@MimiS2008 Can you run the example script at https://github.com/granularag/USBuildingFootprints/blob/master/example.py?
Thanks so much for your response. My phyton experience is very minimal but I will try. But also, since I need California counties only (I know that is not little!), is there any other way I could reach the shape files. AWS link did not work even though I created an account for myself.
On Monday, March 11, 2019, 1:32:38 PM PDT, Chip Franzen <notifications@github.com> wrote:
@MimiS2008 Can you run the example script at https://github.com/granularag/USBuildingFootprints/blob/master/example.py?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@MimiS2008 That link, s3://glr-ds-us-building-footprints
is an s3 path, not a url for a webpage. The easiest way to interact with an s3 bucket is probably aws-cli
. https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
If you install that and run aws configure
you should be able to input your AWS credentials, then use aws s3 <command>
to interact with the bucket.
aws s3 ls s3://glr-ds-us-building-footprints
will list the bucket contents, and you can use aws s3 cp
to download the files you want.
Thank you.
On Monday, March 11, 2019, 2:21:26 PM PDT, Chip Franzen <notifications@github.com> wrote:
@MimiS2008 That link, s3://glr-ds-us-building-footprints is an s3 path, not a url for a webpage. The easiest way to interact with an s3 bucket is probably aws-cli. https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
If you install that and run aws configure you should be able to input your AWS credentials, then use aws s3
aws s3 ls s3://glr-ds-us-building-footprints will list the bucket contents, and you can use aws s3 cp to download the files you want.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi I did all. My access is denied. I put my access keys and then put us-east-1 as the region (I only kept that active). On Monday, March 11, 2019, 2:21:26 PM PDT, Chip Franzen notifications@github.com wrote:
@MimiS2008 That link, s3://glr-ds-us-building-footprints is an s3 path, not a url for a webpage. The easiest way to interact with an s3 bucket is probably aws-cli. https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
If you install that and run aws configure you should be able to input your AWS credentials, then use aws s3
aws s3 ls s3://glr-ds-us-building-footprints will list the bucket contents, and you can use aws s3 cp to download the files you want.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
https://github.com/woodb/geojsplit works for me.
node --max-old-space-size=20480 /usr/lib/node_modules/geojsplit/bin/geojsplit -a 1 -l 3000000 -v -o ~/data ~/data/California.geojson
I have divided California.geojson into four files with max 3000000 features per each.
By default, node has only 512 Mb memory limit so --max-old-space-size=20480 increase it to 20Gb
Hello,
I see that the datasets are available on the state level – however, these datasets for each state are very large and I did not have any luck with trying to convert the geoJSON files to shapefiles (the process took way longer than usual and overheated my computer).
Besides suggesting a better computer to handle heavier processing - is it possible get any building footprint datasets for cities, so that the files are smaller and easier on my computer for converting?
I would prefer that the data are all from one source (for consistency), as opposed to going to multiple sources for the data. My team is searching for building footprints for the following cities:
• Minneapolis, MN • Atlanta, GA • Raleigh, NC • St Paul, MN • Charlotte, NC • Winston-Salem, NC • Chicago, IL • Dallas, TX
Any other suggestions/solutions would be very much appreciated!
Thanks, Elija