sentinel-hub / eo-learn

Earth observation processing framework for machine learning in Python
https://eo-learn.readthedocs.io/en/latest/
MIT License
1.11k stars 300 forks source link

Some requests result in 429 errors #19

Closed gmilcinski closed 4 years ago

gmilcinski commented 5 years ago

As Sentinel Hub introduced rate limiting, some eo-learn (or sentinelhub-py?) requests result in 429 errors as they trigger large number of requests behind the scenes.

It would be good to somehow be able to configure the rate limit and then take this into account when querying Sentinel Hub. And whenever 429 error comes, system should retry along the following lines: https://en.wikipedia.org/wiki/Exponential_backoff

AleksMat commented 5 years ago

eo-learn uses sentinelhub-py for retrieval of data from Sentinel Hub service. Today new version of sentinelhub-py package was released and some handling of 429 HTTP error was added. More will be added once Sentinel Hub service is updated to return information about rate limiting in headers.

Koesters commented 5 years ago

I also run into rate limit, trying to implement a 3x3 patch in my area in SI_LULC_pipeline. The rate limit in the user interface is just a graphic. So I do not know what I am hitting here 20 requests / second or 10000 a day. I only run that section maybe 5 times before the 429 started. I also have no indication if a paid account would help. https://www.sentinel-hub.com/develop/documentation/api/ogc_api/rate-limiting states 20 request per SECOND. Since the pull is obfuscated by the eo-learn library, i find it hard to control this, especially as I have no information, where I am overshooting. https://apps.sentinel-hub.com/dashboard/#/account/billing states 20 requests per MINUTE for both individual non commercial and commercial.

This makes it somehow a frightening prospect to rely on this in a commercial work flow in the future if you have to debug stuff.

AleksMat commented 5 years ago

Hi @Koesters,

You have a valid point. At the moment sentinelhub-py still treats 429 as a connection problem or internal service problem. We plan to improve that, we just haven't had the time yet.

The plan is that rate limiting 429 response should never break the download process. Instead it should be handled internaly by sentinelhub-py and user will only be notified if the rate limit has been reached and that it will take more time to download the data.

By the way, at the moment you can regulate the following in command line:

sentinelhub.config --max_download_attempts <number of attempts> --download_sleep_time <time in seconds>

By specifying higher number of download attempts and more sleep time between attempts you should be able to avoid 429 errors in most cases. However these parameters are intended for other kind of HTTP errors and are only temporal solution for rate limiting.

Koesters commented 5 years ago

Thanks AleksMat,

Is it now 20 requests per minute or per second?

If one wants to test machine learning AI not just in Slovenia, once would need a decent amount of pictures.

Using extra_example-split_AOI my area (Scotland) would be split up in

Dimension of the area is 461288 x 731425 m2

1764 patches. EPSG:32630

multiplied by timeline and over various indexes, eo-learn under 500 a month might not be a valid thing to do then?

Could you offer a calculator on what would have to expect on pressure on your server for what one intends to do?

gmilcinski commented 5 years ago

Hi Koesters, if you are using Sentinel Hub trial account, it has a limit set to 10.000 requests per day. Trial service access is meant to test the service. If you want to do a mass scale processing, you would need to subscribe to one of the packages. For your use-case, you could either choose the 10req/second (2.000 EUR per month), 20req/second (3.000 EUR per month), 30req/second (4.000 EUR per month),...

Individual packages are meant for individual use (e.g. QGIS, ArcGIS), not for machine learning processes ("machine-to-machine services" option is not supported for these)

Best, Grega

Koesters commented 5 years ago

Well AI on sat data kinda needs mass processing and you seemed to want to encourage participation in the medium posts on eo-learn.

A key resource for the success of eo-learn is, of course, the community, both of remote sensing and machine learning experts. We therefore invite anyone with interests in developing large-scale remote sensing applications using spatio-temporal satellite imagery to try eo-learn out, give us feedback, and possibly contribute to it. We welcome code improvements, new EOTask classes, and new workflow examples . https://medium.com/sentinel-hub/introducing-eo-learn-ab37f2869f5c

gmilcinski commented 5 years ago

Hi @Koesters, you were referring above to the commercial exploitation of this service and I hope you agree it is fair that we cover the costs on our side (processing these data does cost quite a bit on such scales) if users generate revenues on top of our services. In case you would like to explore these data (and services) from research point of view, there are a couple of options available, which do not include costs: -you can write a very simple proposal to get ESA-sponsored package. Go to this page: https://earth.esa.int/aos/OSEO and choose "Submit a Proposal for the OGC EO Interface Integration Service" -you can send us an e-mail and explain, what you would like to do

Best, Grega

Koesters commented 5 years ago

Hi Grega,

it's a misunderstanding.

I saw you at dataspace 2019 in Jeff Brueger's session. This is a massive effort and is a lot of work, costs a lot of money etc.

I started and thought, I try this out played around for a day and then came onto this error. That led to me thinking of the longer term motivation for other ESA proposals with business plans etc, I am trying to figure things out.

I expected it to be cheaper as it was open source, ML and said one should participate.

mlubej commented 4 years ago

Just wanted to wrap up this issue.

We are improving the package as best as we can, considering other projects we are working on (this is not our main concern). The software to work with the data is open source and free, but the service of the data is not. However, you are not limited to use the data provided by our services.

Cheers, M