home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.75k stars 30k forks source link

ForecastSolar can't recover when API rate limit is hit #106771

Closed hmmbob closed 2 months ago

hmmbob commented 8 months ago

The problem

Been rebooting my systems quite some times, and apparently I've been rate-limited by ForecastSolar. Those errors are filling up my log now 😄 Hits about every 90 seconds, it appears.

What version of Home Assistant Core has the issue?

core-2024.1.0b2

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

Forecast Solar

Link to integration documentation on our website

https://www.home-assistant.io/integrations/forecast_solar/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-12-31 12:18:18.272 ERROR (MainThread) [homeassistant.components.forecast_solar] Unexpected error fetching forecast_solar data: Rate limit for API calls reached. (error 429)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 300, in _async_refresh
self.data = await self._async_update_data()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/forecast_solar/coordinator.py", line 67, in _async_update_data
return await self.forecast.estimate()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/forecast_solar/__init__.py", line 156, in estimate
data = await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/forecast_solar/__init__.py", line 125, in _request
raise ForecastSolarRatelimit(data["message"])
forecast_solar.exceptions.ForecastSolarRatelimit: Rate limit for API calls reached. (error 429)

Additional information

No response

home-assistant[bot] commented 8 months ago

Hey there @klaasnicolaas, @frenck, mind taking a look at this issue as it has been labeled with an integration (forecast_solar) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `forecast_solar` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign forecast_solar` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


forecast_solar documentation forecast_solar source (message by IssueLinks)

klaasnicolaas commented 8 months ago

So what is the issue? 🤷🏻

hmmbob commented 8 months ago

As discussed over Discord, it would be great if this error is caught and just prints a single error/line in the log

havlejan commented 8 months ago

Have a same issue after update home assistant to version 2024.1.5. Updated on Saturday 20.1. evening. Since this time the Forecast.Solar integration has an error.

jgibson02 commented 8 months ago

I have also been running into this error over the past week, I've tried disabling the integration for at least a day and re-enabling it twice and it still encounters this error. I also tried signing up for a trial for the Personal tier license key and adding that to my integration but that didn't work.

klaasnicolaas commented 8 months ago

I did some research to get more clarity about where things go wrong and especially how the problem can persist for days.

By default, if you make more than 12 requests to the API in an hour, you will get a rate limit error and this can be caused, for example, by many restarts of Home Assistant. But what I noticed is that with a rate limit exception, an error appears in the logs every minute, so it may try to execute a request to the API every minute.

As a result, the reset time is always pushed forward (rolling reset time), which puts you in a limbo and the problem does not solve itself after waiting 1 hour and can last for days.

asciidisco commented 8 months ago

I'm running into the problem as well, with an anonymous account (so no API Key); from the Code it's quite clear that it should only updaten hourly when no API-Key is set. I believe the problem for me (and maybe for others as well) is, that if the API call fails, then Home Assistant tries to re-initialize the Integration after 1 Minute (or 90 seconds, which I believe is more correct), then receives the same error & after 1 Minute tries again, and again, and again... ...pushing the next possible working call further into the future.

It would be probably be best, going full circle with the issue, to not let the Integration go into an erronious state with this error, so that the Integration itself can handle the interval, without being re-initialized all over.

chinezbrun commented 7 months ago

me too

klaasnicolaas commented 7 months ago

Just stating that "you also have the issue" does not help solve the problem and only pollutes the thread, so please don't do that. If you would like to stay informed, you will find a subscribe button on the right of the sidebar and you will receive notifications 😉

./Klaas

EinSchwerd commented 7 months ago

I am encountering the same issue. I tried disabling the integration for 12 hours (overnight) to ensure it did not exceed the API call limit. I re-enabled it, and the same issue occurred immediately on the first API call as the integration was starting.

iancg commented 7 months ago

I am encountering the same issue. I tried disabling the integration for 12 hours (overnight) to ensure it did not exceed the API call limit. I re-enabled it, and the same issue occurred immediately on the first API call as the integration was starting.

I've also had the same, I wonder if either failed accesses are counted against you as a tally (e.g. I had this happening for 24+ hours before I noticed, so I would have accumulated 3 24 40 = 2880 rejected requests, which at 12 calls per IP per hour is going to take 10 days to clear ;-( Equally it could just be a bug in the rate limiting at Forecast.Solar.

K-Ko commented 7 months ago

Knut here, possibly it would be a way to check responses with HTTP code 429 for retry at

image

or the headers

image

This should be stored somewhere and checked before next call. (This does not even have to be deleted, as every call in the far future will always be after this timestamp)

The zone holds with IP ... or API key ... the reason/scope, e.g. for logging.

iancg commented 7 months ago

https://github.com/home-assistant-libs/forecast_solar/blob/master/forecast_solar/exceptions.py shows that the ForecastSolarRatelimit exception being thrown includes reset_at

Looking at https://github.com/home-assistant/core/blob/dev/homeassistant/components/forecast_solar/coordinator.py around line 67, it needs to catch ForecastSolarRatelimit and adjust the time at which the next retry can be done.

Further looking at https://github.com/home-assistant/core/blob/dev/homeassistant/helpers/update_coordinator.py I can see that there is update_interval which controls the frequency of the polls, but next_refresh isn't available so I can't see how to make the update coordinator delay until the desired time - setting the update interval only affects the refresh after the next.

iancg commented 7 months ago

Maybe something like: https://github.com/home-assistant/core/commit/4364b174d83912f699c4ff94ce71250af3b0bd49 (totally untested) might work?

I've tried but failed to get my local ha to load the revised code as a custom component. Looks like I may need to set up a proper ha dev env to try this (I've only ever made very minor changes to HACS installed custom components before).

Dutchy-79 commented 7 months ago

Same here,

Logger: homeassistant.components.forecast_solar Source: helpers/update_coordinator.py:313 Integration: Forecast.Solar (documentation, issues) First occurred: February 10, 2024 at 10:55:29 (4544 occurrences) Last logged: 12:12:20

Unexpected error fetching forecast_solar data: Rate limit for API calls reached. (error 429) Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 313, in _async_refresh self.data = await self._async_update_data() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/homeassistant/homeassistant/components/forecast_solar/coordinator.py", line 67, in _async_update_data return await self.forecast.estimate() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/forecast_solar/init.py", line 156, in estimate data = await self._request( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/forecast_solar/init.py", line 125, in _request raise ForecastSolarRatelimit(data["message"]) forecast_solar.exceptions.ForecastSolarRatelimit: Rate limit for API calls reached. (error 429)

Not sure what you need from me to help you solve this.

ol3k commented 7 months ago

I am encountering the same issue. I tried disabling the integration for 12 hours (overnight) to ensure it did not exceed the API call limit. I re-enabled it, and the same issue occurred immediately on the first API call as the integration was starting.

I've also had the same, I wonder if either failed accesses are counted against you as a tally (e.g. I had this happening for 24+ hours before I noticed, so I would have accumulated 3 24 40 = 2880 rejected requests, which at 12 calls per IP per hour is going to take 10 days to clear ;-( Equally it could just be a bug in the rate limiting at Forecast.Solar.

I hit the limiting too. Because of the internal integration reloads, it never recovered. I cannot confirm your calculation because my test calls showed that a rate-limiting would end in about 1–2 hours in the future.

I got it back to work:

  1. I disabled the integration to disable the API calls.
  2. Test calls: With an example API call, you can check when you are allowed to call again:
curl -v -H 'Accept: text/csv' 'https://api.forecast.solar/estimate/watthours/day/52/12/37/0/5.67'

As already mentioned above you will get details when you are allowed to try next, what are the limits and the current calls which are registered:

- x-ratelimit-period
- x-ratelimit-limit
- x-retry-at
  1. when you are able to get information with the curl call
2024-02-15;6707
2024-02-16;12691
  1. enable integration: I enabled it one by one because of several planes/strings. Just to make sure, re-enabling everything at once could possibly trigger the limit again.
  2. Integration is working again.

Anyway: The RatelimitException (HTTP Codes 429) should not be treated as an integration fault. Maybe just log an unsuccessful call and a hint for old data, but without reloading or restarting a new attempt.

For now, this behavior leads to never ending reloads and API limiting.

klaasnicolaas commented 7 months ago

An idea to solve this is to adjust the update_interval to a time delta when receiving this exception, which after the reset restores the update_interval to the old situation upon a successful API call.

We have already done some testing with this, but ran into some problems with the coordinator and have not yet been able to figure it out / solve it.

menloperk commented 7 months ago

Not sure if it's same error but this integration is not working anymore for some time now! This is in the logs: image

Update: disabling the integration for at least an hour and then reenabling makes the integration work again. So indeed it seems it just can't recover when API limit is hit.

AIR-Force007 commented 7 months ago

I have ran into the same senario as above, in my case the error was quite quick to resolve.

  1. Disable the solar integration,
  2. The command: curl -v -H 'Accept: text/csv' 'https://api.forecast.solar/estimate/watthours/day/52/12/37/0/5.67'
  3. I noticed that it is restricting the data to the IP address of the WAN side.
  4. so I just reset the internet connection to get a new public address, enabled the integration and all was well.
  5. for static ip addresses it may be much more difficult to get a new IP, or you can make use of a VPN to change the device`s out IP or just block the device for a hour or so, but this step is a bit annoying to do. I do agree to have a waiting period active with the integration or check past requests and not request more than required. that will help to minimise the requests as mentioned above.
thitcher commented 7 months ago

Not sure if it's same error but this integration is not working anymore for some time now! This is in the logs: image

Update: disabling the integration for at least an hour and then reenabling makes the integration work again. So indeed it seems it just can't recover when API limit is hit.

Same here, blocked HA for one hour from internet, reconnected again...voila!

bertybassett commented 7 months ago

brand new install was working for 25 minutes then I made a single change to the configuration and now I get 429 errors.

Point to note I made no reboots throughout so why should I hit the API limit?

will block from internet for an hour but that doesn't seem right.

thitcher commented 7 months ago

After I reconnected to the internet, I waited another 1-2 hours (for security reasons) and then changed a few values, that worked

jwdeboer commented 6 months ago

This is the 'formal' response from forecast.solar

In such cases you should

  • Deactivate/uninstall the integration
  • Wait at least 60 minutes
  • Check your parameters and preferably via direct API call (e.g. via curl)
  • If the direct API call works, reactivate/reinstall the integration

https://doc.forecast.solar/facing429

ciaocibai commented 6 months ago

brand new install was working for 25 minutes then I made a single change to the configuration and now I get 429 errors.

Point to note I made no reboots throughout so why should I hit the API limit?

will block from internet for an hour but that doesn't seem right.

I just did a brand new install and was given the same error from the get go, no idea why that would be. I've reinstalled the plugin, waited one hour and still the same issue.

jwdeboer commented 6 months ago

I just did a brand new install and was given the same error from the get go, no idea why that would be. I've reinstalled the plugin, waited one hour and still the same issue.

Try (from desktop or phone etc) this api: https://api.forecast.solar/estimate/watthours/day/52/12/37/0/5.67

It should get you a response message 429 and in the bottom you see something that states the timestamp you should try again.

Ensure you do not make any api call until that moment.

After that moment, try again via the browser, when you get a response message including forecast numbers, you should be good to go again and can enable the integration again.

Everytime you do an API call during your block windows, the block will be extended.

bj00rn commented 6 months ago

Same issue here, maybe a fix can be implemented.

I think that the problem is related to coordinator.async_config_entry_first_refresh() called during intergation setup. When ConfigEntryNotReady is raised in coordinator.async_config_entry_first_refresh() HA automatically raises ConfigEntryNotReady and reschedules a reload which in turn extends the rate limit.

    async def async_config_entry_first_refresh(self) -> None:
        """Refresh data for the first time when a config entry is setup.

        Will automatically raise ConfigEntryNotReady if the refresh
        fails. Additionally logging is handled by config entry setup
        to ensure that multiple retries do not cause log spam.
        """
        await self._async_refresh(
            log_failures=False, raise_on_auth_failed=True, raise_on_entry_error=True
        )
        if self.last_update_success:
            return
        ex = ConfigEntryNotReady()
        ex.__cause__ = self.last_exception
        raise ex

During setup the integration should not retry setup (or delay retry, if possible) if status code 429 is received?

Something in the ways of

try:
  await coordinator.async_config_entry_first_refresh()
except ConfigEntryNotReady as e:
  if isinstance(e.__cause__, ForecastSolarRatelimit):
      pass # suppress ratelimit exception during setup
  else:
      raise # raise any other errors

@klaasnicolaas i did a quick test branch at my fork: https://github.com/home-assistant/core/compare/dev...bj00rn:core:fix-rate-limit-error-in-setup WorksOnMyMachine(TM). The downside is that model cannot safely be derived from datacoordinator anymore if we are rate limited during setup. I can do a PR if this seems like a viable solution.

Cheers

klaasnicolaas commented 6 months ago

This won't fix it, especially for users who have previously set up the integration but simply made too many requests. The problem lies in the updateCoordinator and the function that retrieves the data.

./Klaas

bj00rn commented 6 months ago

This won't fix it, especially for users who have previously set up the integration but simply made too many requests. The problem lies in the updateCoordinator and the function that retrieves the data.

./Klaas

@klaasnicolaas Are you sure? When i say setup i mean when async_setup_entry is called, not configuration in the config flow.

I think this will actually fix the problem (or at least a critical part of it). For me the problem arises on frequent reboots of HA. I already have three instances of the integration configured, one for each PV string which amplifies the problem.

Maybe my understanding of how integration setup works is incorrect, but here it goes:

  1. When HA is booted (or integration is reloaded manually) async_setup_entry is called in which you call coordinator.async_config_entry_first_refresh.
  2. If an exception is raised in async_config_entry_first_refresh the exception ConfigEntryNotReady is raised by the coordinator base class.
  3. HA will then automatically schedule a retry of async_setup_entry (seems that the delay is 80-ish seconds here, with some kind of random seed).
  4. When the retry happens the rate limit will be extended, another ConfigEntryNotReady will be raised by the next async_setup_entry and we are stuck in a loop where the rate limit is extended every 80 seconds.
image

But maybe I am missing something here? . Is there more to this issue?

Suppressing the RateLimitError (all other errors will still be raised, authentication etc) in async_setup_entry will let the integration setup correctly on reload and not be stuck in an endless retry cycle. Even for public accounts, I think the default update_interval of 1h should eventually allow the rate limit to clear as 12 calls are allowed per hour, but only IF the integration has been allowed to setup correctly.

https://developers.home-assistant.io/docs/config_entries_index/#setting-up-an-entry

During startup, Home Assistant first calls the normal component setup, and then call the method async_setup_entry(hass, entry) for each entry. If a new Config Entry is created at runtime, Home Assistant will also call async_setup_entry(hass, entry) (example)"

https://developers.home-assistant.io/docs/integration_setup_failures/#integrations-using-async_setup_entry

Raise the ConfigEntryNotReady exception from async_setup_entry in the integration's init.py, and Home Assistant will automatically take care of retrying set up later. To avoid doubt, raising ConfigEntryNotReady in a platform's async_setup_entry is ineffective because it is too late to be caught by the config entry setup.

K-Ko commented 6 months ago

As I said here, I think (independent of concrete implementation because not familiar with HA) about an abstract logic during integration installation, boot up, normal run mode etc.

Then the integration can work as now, if it runs for days/weeks fine, the (last) "retry at" is (far) before "now" and it runs smoothly :-)

bj00rn commented 6 months ago

As I said here, I think (independent of concrete implementation because not familiar with HA) about an abstract logic during integration installation, boot up, normal run mode etc.

  • Integration installation comes e.g. with a default .retry-at flag file with 1970-01-01 00:00:00 in it
  • On API fetch, the actual system timestamp is checked against "retry at" time in the flag file

    • if "now" is after "retry at", fetch
    • if not, just skip
  • If then at some point in time a 429 response comes up, the header "retry at" or response body "retry at" will be written to the flag file and is thus simply observed at the next "run".

Then the integration can work as now, if it runs for days/weeks fine, the (last) "retry at" is (far) before "now" and it runs smoothly :-)

The problem as I see it is purely related to Home Assistant integration setup logic. The integration never gets a chance to setup if it is rate limited from startup (startup being when the integration is reloaded, either due to a reboot, integration added/reloaded) as there is no documented way of postponing/setting the retry of integration setup. Retries will be fired every 80 seconds indefinitely. On my server the integration has made ~100k requests over the last month due to this problem.

Once the integration has setup correctly it should be possible to implement logic to observe the response and delay any further requests if rate limit is reached. Any such logic probably won't be necessary though, since even the public still api supports 12 requests/hour per IP. The default delay in the integration is 1h so there should be no major problems with rate limit being hit.

The only exceptions I can think of would probably be when you are using a public account and

K-Ko commented 6 months ago

There is another finding on my side.

Independent from a 429 during setup. which is not recognized, also return code 400 leads to an endless loop!

Here is one example of calls with an invalid location, results in a 400, but bombs the API with requests each 80 sec. :-( ...

image

image

Is it possible that not only a 429 is not recognized, but that the response code is not checked for 200 OK at all?

At the moment all calls are fully answered to give the requester the change to analyse the response, but in future it could be that such "false" requests are intercepted more generically.

bj00rn commented 6 months ago

There is another finding on my side.

Independent from a 429 during setup. which is not recognized, also return code 400 leads to an endless loop!

Here is one example of calls with an invalid location, results in a 400, but bombs the API with requests each 80 sec. :-( ...

image

image

Is it possible that not only a 429 is not recognized, but that the response code is not checked for 200 OK at all?

At the moment all calls are fully answered to give the requester the change to analyse the response, but in future it could be that such "false" requests are intercepted more generically.

Looks like a related but separate issue, I think that during the config flow an api request should ideally be made to confirm the options (location, api key etc) provided before submitting the form. Any rate limit exceptions that are raised during config flow should probably prevent submitting just to be on the safe side.

K-Ko commented 6 months ago

Depending on how sophisticated you want the checks to be:

bj00rn commented 6 months ago

Depending on how sophisticated you want the checks to be:

So to re-cap I see three separate but related issues here with the integration:

  1. A rate limiting error during component setup (when the integration is loaded), will cause an endless request->rate limit loop.
  2. Options should be validated against the API during config flow to avoid creating broken instances of the integration that will never setup correctly and cause an endless request loop.
  3. The general problem with request being rate limited on data refresh after integration has setup correctly. This issue can probably be handled by the integration by postponing next refresh. Having multiple instances of the integration might make this one a bit tricky though since requests are rate limited by IP.
K-Ko commented 6 months ago
  1. A rate limiting error during component setup (when the integration is loaded), will cause an endless request->rate limit loop.

Not only rate limit, any response code not equal 200 should trigger a kind of alert with the response error message. response.message.text (as here)

bj00rn commented 6 months ago

trigger a kind of alert with the response error message. response.message.text (as here)

Yes you are correct here, but under nominal circumstances (the integration has been configured correctly) the retry cycle is desired behaviour. Examples would be; servers are down, dns resolution failure etc. The integration should then try to reload. For rate limiting errors this makes no sense though since rate limit will never resolve by making another request.

Edit: Im beginning to suspect it's probably better to do proper validation in config_flow/options flow and not call coordinator.async_config_entry_first_refresh at all during async_setup_entry.

That way the integration always gets created and requests will only occur at the set refresh interval of the integration. Any errors that arise can be handled by the integration from there on.

I made a PR to the lib to support validation.

issue-triage-workflows[bot] commented 3 months ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.