hotosm / galaxy-api

Backend to fetch data from Underpass
https://galaxy-api.hotosm.org/latest/redoc
GNU Affero General Public License v3.0
14 stars 5 forks source link

Enhancement on Rawdata Endpoint #183

Closed kshitijrajsharma closed 2 years ago

kshitijrajsharma commented 2 years ago

Three endpoints :

/raw-data/current-snapshot/ ~ Takes user request

It has following parameters :

Example Response :

{
  "download_url": "url",
  "response_time": "20 Seconds",
  "query_area": "5000 Sq Km ",
  "binded_file_size": "400 MB",
  "zip_file_size": "50 MB"
}

Remember : You can just pass the polygon to the api to get everything in that area

/raw-data/status/ ~ Gives db last_updated information

This endpoint will be used to check what is the latest status of rawdata database , It is a get request endpoint and provides last_updated time , after checking it with the database and substracting it with current time Sample response :

{
  "last_updated": "Less than a Minute ago"
}

This will be used to create this type of info in UI image

/raw-data/exports/{file_name}

This endpoint will be used to download files from the server ! If file is not present it will return null ! This is the address which provided from raw-data/current_snapshot endpoint response , You don't need to supply .zip trail after filename because currently we only do zip binding and hence it will be automatically added to the filename , handled by API itself

Benchmarks :

This is just a initial load testing with few number of inputs on current snapshot with geojson output , Tested on Expensive query ~ Extracted everything without using filters Data Available in RDS : Asia and Africa updating every minute from planet server Load Testing tool used : Locust Area used for testing : "5446 Sq Km " Geojson Size : ~ Approx. 400 MB No of features in area processed : Approx. 1M rows Tested on : Devserver with 4GB of ram and 30GB of space

We have managed to keep that request response time with average of Approx. 48 secs , When 4 users are sending request each second, tested with 69 requests ! When database server was free it came back in 31 sec and even with max out came back with max time of 59 sec , Postgresql cache may have played a role here when passing same polygon but we can generalize it image Download Full report here report.pdf Usually it takes 15-25 min with our exisitng tools for same area with same features !

How to test : You can directly test this branch live on this Dev server and dummy UI , You can see more instruction on UI itself

Note : This PR has feature of OSM login authentication but disabled right now for testing due to which user information is not included in exports , PR requires ogr2ogr installed locally on the machine that hosts the product . Once PR gets merged and starts on production server we can reenable the authentication

cc : @LeenDhondt