johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
892 stars 158 forks source link

Search raises 403 HTTP error asking for captcha on VPS #190

Closed thepeshka closed 3 years ago

thepeshka commented 3 years ago

Lib should use api.genius.com/search if token provided.

Expected behavior Lyrics returned

To Reproduce

>>> from lyricsgenius import Genius
>>> genius = Genius(TOKEN)
>>> genius.search_song("Hello", "Adele")
Searching for "Hello" by Adele...
Traceback (most recent call last):
  File "/root/testvenv/lib/python3.6/site-packages/lyricsgenius/api/base.py", line 80, in _make_request
    response.raise_for_status()
  File "/root/testvenv/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://genius.com/api/search/multi?q=Hello+Adele

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/testvenv/lib/python3.6/site-packages/lyricsgenius/genius.py", line 401, in search_song
    search_response = self.search_all(search_term)
  File "/root/testvenv/lib/python3.6/site-packages/lyricsgenius/api/public_methods/search.py", line 210, in search_all
    return self.search(search_term, per_page, page, endpoint)
  File "/root/testvenv/lib/python3.6/site-packages/lyricsgenius/api/public_methods/search.py", line 45, in search
    return self._make_request(path, params_=params, public_api=True)
  File "/root/testvenv/lib/python3.6/site-packages/lyricsgenius/api/base.py", line 88, in _make_request
    raise HTTPError(response.status_code, error)
requests.exceptions.HTTPError: [Errno 403] 403 Client Error: Forbidden for url: https://genius.com/api/search/multi?q=Hello+Adele
# curl https://genius.com/api/search/multi?q=Hello+Adele
...
<div class="line2">Sorry, we have to make sure you're a human before we can show you this page</div>
...

Version info

Additional context Reproduced on VPS located in Germany. Locally everything works fine.

allerter commented 3 years ago

Genius.search_song is a convenience method and calls the following methods:

genius.search_all
genius.song
genius.lyrics

But as you have experienced, VPS and proxy users are very likely to get a 403 error because of Genius's captcha service. Even If we switched to using api.genius.com/search, you would still get a 403 error in genius.lyrics. I suggest either going behind a proxy that works for Genius or limiting your calls to developers' API methods. For example, if you wanted to search for a song and get its info, you could try this which only uses developers' API methods:

songs = genius.search_songs("Eminem Rap God")["songs"]
for song in songs:
    if song['title'] == "Rap God":
        song_id = song['id']
song = genius.song(song_id)

You can view the list of developers' API methods in the docs.

thepeshka commented 3 years ago

@allerter I didn't thought about this. That's sad. Btw, thanks for explanation!