Closed oknozor closed 3 years ago
@oknozor From the wording it seems that only anonymous user-agents are subject to throttling (?) Have you tried cycling different browser user-agents?
At the time I opened this issue I had not yet written the user agent feature, so I didn't thought about doing it this way. That's a great idea, I will try that when I have some spare time. Or maybe you would like to submit a PR ?
Should we simply include headers in every request by default now that we have the user agent feature? In the docs under #User-Agent, it mentions:
For "python-musicbrainz/0.7.3": we allow through (on average) 50 requests per second, and decline the rest (though recently this has not been hit).
I glanced at the source code to python-musicbrainz2 and it indeed sets the user agent by default to python-musicbrainz/$version
.
So, similarly perhaps we could default to something like musicbrainz_rs/$version
?
That said, but now in python-musicbrainz3 (which aims to be a successor to python-musicbrainz2
and also referred as python-musicbrainz-ngs
), it forces the client user to set the user agent themselves.
Also from the docs here, it mentions:
Our client libraries now support functions for setting the User-Agent string. If you are using one of our libraries, you will need to add a call to set the User-Agent string via the library and your application should work again.
Is this maybe a hint that all future official libraries are now supposed to force users to set their own user agent? In this case, we should probably go this route if being an official rust client is on our roadmap.
I guess it's worth trying to set the user agent in the integration tests and see if it does not hit rate limiting, but I am afraid it will still fails due to IP blocking/throttling rules.
Being an official rust client is definitely on our roadmap but I am not sure setting the user agent should be mandatory. I think we shall discuss this further with musicbrainz people.
For now we can simply try to fix the test suite, I will ask on the IRC if we should use a default musicbrainz_rs/$version
header or force the user to set is own user agent string. I'll get back yo you when I know more.
I will ask on the IRC if we should use a default
musicbrainz_rs/$version
header or force the user to set is own user agent string. I'll get back yo you when I know more.
Alright!
I guess it's worth trying to set the user agent in the integration tests and see if it does not hit rate limiting, but I am afraid it will still fails due to IP blocking/throttling rules.
I just tried this out locally and also on my fork in ritiek/musicbrainz_rs#2 (to see how GitHub actions behaves) by adding a user-agent and removing the 1s sleep from all tests. You are right about tests still failing due to IP throttling.
but implementing a retry on
HTTP 503
responses might also be a good idea.
This could be a way out maybe. However, when MusicBrainz hits the rate-limit, they seem to only return the HTTP 503 error code, but nothing else useful in the response content such as how much time to wait before making the next query that wouldn't be rate-limited. We probably may have to sleep for small random durations before making a retry query if we are to implement this I think, which I'm not sure MusicBrainz people will be happy with.
No response from musicbrainz people so far, I guess we will stick with the default header for now and let users set the user-agent if needed. Anyway I am not comfortable with making the user agent header mandatory.
As for the retry on 503 response, this is actually a good idea. I just checked, musicbrainz server send a Retry-After
header so this would be straight forward to implement.
As for the retry on 503 response, this is actually a good idea. I just checked, musicbrainz server send a
Retry-After
header so this would be straight forward to implement.
Wow, that's cool! I shouldn't have overlooked the response headers.
Rate limiting cause integration tests to fail randomly, for now a quick'n dirty fix has been made with thread sleeps on every test.
Adding a proper user agent header as described in MusicBrainz rate limiting documentation might solve the issue.
This still need to be figured out but implementing a retry on
HTTP 503
responses might also be a good idea.