wmgeolab / geoBoundaries

geoBoundaries : A Political Administrative Boundaries Dataset (www.geoboundaries.org)
http://www.geoboundaries.org
Other
281 stars 49 forks source link

[SUBMISSION BUG] Heroku backend failing for uploads in many cases #2336

Open DanRunfola opened 2 years ago

DanRunfola commented 2 years ago

The heroku app for handling submissions is breaking in some cases - example provided by a user (Jim C. @ IMB) copied here:

I’ve tried twice to submit this polygon using your Contribute a Boundary File tool, both times receiving an Application error:

image

I’m guessing the application doesn’t like the file size (32 MB). The polygon has over 3 million vertices.

I’ve shared UKR.zip here: (REMOVED)

The polygon is digitized from 2021 Sentinel-2 10-Meter LandUse/LandCover raster data from https://env1.arcgis.com/arcgis/rest/services/Sentinel2_10m_LandCover/ImageServer. That data is part of the ESRI Living Atlas (https://livingatlas.arcgis.com/landcover/) data licensed as Creative Commons Attribution 4.0 (CC BY 4.0).

Jim

DanRunfola commented 2 years ago

I put the suspect file up at: geolab.wm.edu/assets/static_files/UKR.zip

maxmalynowsky commented 2 years ago

I'm having a submission error as well, but in my case it's due to a lack of proper SSL certificate.

Screen Shot 2022-09-05 at 21 26 04
MichaelBrandonFalk commented 2 years ago

I'm also having difficulty with a file that is 26.9 MB from The World Bank (the one @justinelliotmeyers helped us find) error

DanRunfola commented 2 years ago

Just flagging that we're looking into this.

Broadly, I would like to find some route to do this that isn't built on Heroku (or any other third-party service). Today this is the only part of the entire gB stack that isn't static / github actions based, and thus it breaks occasionally as we're seeing. Very open to suggestions, but working to patch this in the interim.

karimbahgat commented 1 year ago

Just to add some comments about migrating off of Heroku. In theory it would be possible via GitHub Actions, since they can be triggered through web requests (called repository_dispatch and can then run arbitrary python code. The problem that I found was that the way that repository_dispatch events work by sending json data in the body of the http request, which can then be passed to Python by setting env variables in the workflow yaml. This means the file contents have to be encoded and sent with the body, and then loaded into memory by the Github Action (which will probably crash things in many cases), and subsequently passed as strings to env variables for Python to read (which is likely to break the string size limit of the env variables). The reason for doing it through Django right now is that it supports receiving and therefore streaming (not loading everything into memory) the data through the http FILES protocol. Maybe there's another way of doing it through Github Actions though.

On that note, I looked at the Django code and it looks like I actually load the file data into memory anyway, which might be what's breaking on the larger files. I'll try adding one or two lines of code to stream the FILES data to disk, and then reading the file from disk (to avoid loading everything into memory), to see if that fixes things. Although, it's a little odd to me that a 20-30MB file would break the memory, I think Heroku is supposed to have 500MB of RAM, But giving it a shot.

MichaelBrandonFalk commented 1 year ago

I just received it again trying to upload the village boundary file from https://en.data.k4d.la/datasets/c7887ac6fee04bc5af6f0ffe7fc6841a_0/explore?location=18.143846%2C103.999856%2C7.12 Is there something I can do differently to get these submitted. Thank you.

DanRunfola commented 1 year ago

If you email it to us directly (team@geoboundaries.org) I will get it added.

FYSA we have a new hire coming onboard Monday (!) who will hopefully be able to start working on these and many other issues.

MichaelBrandonFalk commented 1 year ago

Thank you Dan!

MichaelBrandonFalk commented 1 year ago

Tried again for Rwanda Admin 5 from the World Bank( village polygons today) will send you an email @DanRunfola with the file and all the needed info. This time the error message was a bit different pasting screenshot below, in case it helps : image

MichaelBrandonFalk commented 1 year ago

Hello, I emailed over Admin 1-4 from https://data.humdata.org/dataset/cod-ab-idn for #IDN #Indonesia to team@geoboundaries.org. I wanted to make sure you received it. There are some end users in Indonesia looking forward to using the updated files. Thank you. - Michael Brandon Falk

DanRunfola commented 1 year ago

Hi MIchael - Yes, confirmed. Apologies on the radio silence, big push right now to get 5.0's infrastructure running nightly.

My goal is to have this in by this comign Friday, though it may already be in the 5.0 I pushed right before the holidays (which isn't up on the website yet, but is on the API and on the git repository). Either way, though, hopefully Friday :)

After this big push, things should be closer to nightly (+ working on a new website / submission pipeline to improve the situation later this semester).

Dan

MichaelBrandonFalk commented 1 year ago

@DanRunfola Thank you!