hotosm / galaxy-api

Backend to fetch data from Underpass
https://galaxy-api.hotosm.org/latest/redoc
GNU Affero General Public License v3.0
14 stars 5 forks source link

Rawdata : Improve Performance of Geojson Binding #130

Closed kshitijrajsharma closed 2 years ago

kshitijrajsharma commented 2 years ago

Research is needed how we can improve performance of geojson binding , Currently we are doing it from scratch going over each row from python . Works well in small data but takes more time in larger dataset specially with query result with large number of rows.

kshitijrajsharma commented 2 years ago

Now geojson binding is lot more faster than previous scenario , I am using orjosn library for json dumps and loads process and also I am getting each row as geojson feature from the query itself . Previously I tried to pull whole geojson feature collection from the query but during getting big files postgresql's fuction json aggregate array gets max out and hence I am doing it in every row level so that I can bind it as feature collection from the API. Now while processing 250-300mb of geojson feature it is taking 10 sec as compared to previous approach was taking 2 and half minute. Now since i am getting those query result data , binding them to memory geojson file also binding them to memory zip file and delivering to user , It is eating approx 2gb of ram in order to deliver 400mb of file which compressed to approx 50-65mb of zip file . Memory issue could be a serious problem Need to more of research to overcome this approach cc: @ramyaragupathy