jacquesh / foo_openlyrics

An open-source lyric display panel for foobar2000
MIT License
403 stars 24 forks source link

Gracefully back off when throttled by a remote source #254

Open H2Swine opened 1 year ago

H2Swine commented 1 year ago

Console output. Usually AZLyrics.com , but once also Genius.com on the first file in a bulk search. Maybe this can be closed quickly, not even sure if this is a "bug" report - but "unrecognised type" error doesn't look like "expected" behaviour, so ... reporting it for what it is worth.

Furthermore, I get quite a few Musixmatch warnings like the following: WARN-OpenLyrics: Received musixmatch search result but body was malformed: {"message":{"header":{"status_code":401,"execute_time":0.0077629089355469,"hint":"captcha"},"body":[]}} No idea whether that is a known issue and triggers a workaround or anything

jacquesh commented 1 year ago

I just did a manual search that successfully returned results for AZLyrics and Musixmatch so they're at least not completely broken. It does happen every now and then that sites will change their format or what headers they require or something and sources break as a result but that appears to not be the case here.

Looks to me like that musixmatch error is just a rate-limiting effort on their part, so not really anything to be done about that other than slowing down/making fewer requests. It's hard to tell if the same is true for your azlyrics/etc failures (at least not without the actual errors in question) but since it works for me at the moment I would guess its some ephemeral issue (either a rate-limiting-type deal or just a network issue on either your or their end).

Occasional errors are, for the above reasons, normal and expected. Consistent errors are probably a bug...unless they're consistent because you keep sending too many requests I guess...but I'm going to assume you're not responding to a request to slow down by continuing to make more requests :)

I'm going to close this pending more information (with a different error). Feel free to reply with more info/errors if you still think its actually a bug (also if you do that, please remember to fill in the plugin version information as requested by the bug issue template).

H2Swine commented 1 year ago

unless they're consistent because you keep sending too many requests I guess...

This! And I did a web search, found https://yabb.jriver.com/interact/index.php?topic=119857.0 and lo and behold: when I navigated my usual web browser to azlyrics.com - which I should have done before opening the issue I admit - I got the following: Our systems have detected unusual activity from your IP address (computer network). This page checks to see if it's really you sending the requests, and not a robot.

After solving the CAPTCHA, the error disappears from the fb2k console, and instead I get the more expected "WARN-OpenLyrics: Failed to download azlyrics.com page https://www.azlyrics.com/lyrics/forn/dwelleronthethreshold.html: Object not found"

This "kinda" gives a suggestion to the component: upon this error, report to the user that AZlyrics refuses; reasons could be that the site finds the level of activity suspicious - try browse to AZlyrics.com and enter a CAPTCHA. If this doesn't work, there is nothing the component can do.

Maybe mention this generally as a "known issue" and state that you rather prefer not to fix it, as it would slow down the component. Users who want more zealous search, could filter on unsynced lyrics MISSING and lyrics MISSING and try again, possibly after trying to access the sites in question through their broswers

Also I am on version 1.6 on 2.0x64 - and I am quite sure it was 1.5 before I reproduced it all after upgrading.

H2Swine commented 1 year ago

This "kinda" gives a suggestion to the component: upon this error, report to the user that AZlyrics refuses

Hmmm ... no need to open a new issue here although the problem is identified? @jacquesh ?

jacquesh commented 1 year ago

I'm not sure what you mean. If websites are telling us that we're calling them too often then we should probably not add more ways to quickly allow us to call them again. Instead we should maybe try to slow down the requests we're making in the first place. For example bulk search rate-limits the requests that it makes but there isn't anything stopping you from just manually playing many tracks for a few seconds at a time (which, if they're new tracks that you don't yet have locally-stored lyrics for), or doing many manual searches, so I can easily see an argument for adding some minimum time between searches for those as well. Such limiting could include a bit of leeway for bursting so that (for example) you don't get throttled if you just want to make a single change to the manual search request (e.g allow 3 requests and then force a 30-second wait or whatever).

I've never run into this and the fact that it's only coming up now suggests this is not a particularly common occurrence. Do you have auto-save enabled? Is your save-source also the first load-source? Did this happen after you did a large bulk search? Were you doing lots of manual searches or playing lots of new tracks?

Maybe mention this generally as a "known issue" and state that you rather prefer not to fix it, as it would slow down the component. Users who want more zealous search, could filter on unsynced lyrics MISSING and lyrics MISSING and try again, possibly after trying to access the sites in question through their broswers

This shouldn't be necessary. You should have the save source as your first search source and then the search will always pick up whatever you have saved locally before going to the internet anyway. For ages I've been meaning to enforce this but haven't gotten round to it because it complicates the config UI.

H2Swine commented 1 year ago

First, it seems this unrecognized error is not "unknown" anymore. More significant: current behaviour will lead to thousands of calls after the site has responded it will not deliver lyrics - precisely what one should avoid:

If websites are telling us that we're calling them too often then we should probably not add more ways to quickly allow us to call them again.

No, instead the component could better suspend the source temporarily and tell the user to fix it manually, rather than calling it over and over again, once for every track.

Instead of phoning up thousands of times the next day, wait and let user sort it out. It will surely take the user much more time than the next few futile look-ups.

Oh, and yes it happened not only during a large bulk search, but also after quite a bit of testing back and forth. I guess that is what a curious user might do upon installing the component and trying to see how it fares against the old lyric search components. Since I did a bulk search, then as far as I understand it is irrelevant that my autosave setting is/was "Only synced lyrics"? I think the source order was local, Musixmatch, Genius, Darklyrics, AZLyrics, but I am not sure.

jacquesh commented 1 year ago

Yeah fair enough. In cases where we can tell from the result that the website is rate-limiting us (or prompting for a captcha, we should probably either 1) Hold off on automated searches to that source if we get a response saying "no more requests for the next 5 minutes" or whatever, or 2) More likely, and as you have found for azlyrics, if they require some user action: Suspend searches to that source and tell the user that they need to visit the site manually to resolve the captcha (or whatever the case may be).

I'm re-opening this ticket for that work and changing the title accordingly.

This leaves some question as to what that UX might look like. Initial thoughts suggest that maybe we should just remove that source from the list of active sources and tell the user "go fix the thing and then come back and re-enable the source". This would at least have the benefit of 1) being relatively easy to support and 2) re-using workflows that the user is already familiar with (we don't need a separate UI that the user needs to figure out for "re-activating temporarily-suspended sources").

Your comments prompted me to also create #263 and #264