johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

"search_song" return a fail #148

Closed MatthieuBonbon closed 4 years ago

MatthieuBonbon commented 4 years ago

Describe the bug When I execute these commands, sometimes they return the correct answer to me and sometimes they return it to me (see photo). Could someone please help me understand and correct this bug?

Bug API Genius

Version info

Thanks Everyone !

allerter commented 4 years ago

This looks like to be the same issue as #139. If you'd like to know what's happened, read my comment on that issue. Otherwise here's the solution:

There are two solutions.

Solution 1

Seems like Genius has made a change in the div tag that holds the lyrics. Like last time, they've changed the value of the class attribute again and have set a new value. So it's possible that they'll change it again in the future. Therefore this solution might be a temporary one too just like the solution in #139. Replace the following line in api.py: from:

new_div = html.find("div", class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 jvlKWy")

to:

new_div = html.find('div', class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 kkHBOZ")

Solution 2

In this solution, there is no specific class to look for. Instead, the library searches for a div in which its class has a "Lyrics__Root" in its value. This solution could be as temporary as Solution 1 since Genius can change the whole value of the class attribute. But since Genius has only changed the last bit, this might hold out more. Replace the followings line in api.py: from:

old_div = html.find("div", class_="lyrics")
new_div = html.find("div", class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 jvlKWy")
if old_div:
    lyrics = old_div.get_text()
elif new_div:
    # Clean the lyrics since get_text() fails to convert "</br/>"
    lyrics = str(new_div)
    lyrics = lyrics.replace('<br/>', '\n')
    lyrics = re.sub(r'(\<.*?\>)', '', lyrics)
else:
    return None # In case the lyrics section isn't found

to:

old_div = html.find("div", class_="lyrics")
if old_div:
    lyrics = old_div.get_text()
else:
    div = [tag for tag in html.find_all('div')
           for attribute, value in list(tag.attrs.items())
           if attribute == 'class' and 'Lyrics__Root' in str(value)]

    if div:
        # Clean the lyrics since get_text() fails to convert "</br/>"
        lyrics = str(div[0])
        lyrics = lyrics.replace('<br/>', '\n')
        lyrics = re.sub(r'(\<.*?\>)', '', lyrics)
    else:
        return None  # In case the lyrics section isn't found
Anosema commented 4 years ago

Where is the "api.py" file ?

allerter commented 4 years ago

Where is the "api.py" file ?

Entering the following line in the command line:

pip show lyricsgenius

This will show you the location where the package has been stored (If you're using a venv, activate it first). Look for the lyricsgenius folder there. api.py will be inside that folder.

Keep in mind that this is a manual fix, and there will be an update fixing this soon. Until then, you're welcome to wait or edit api.py as I have explained. A workaround without manually editing the library would be looping over search_song() till the lyrics are returned.

tahsinac commented 4 years ago

Neither of the solutions works for me, unfortunately. I used to face this issue every now and then but since last week, it's been failing 10/10 times. Not sure how to go about this. I'll be waiting for the update. Is there an ETA for the update?

allerter commented 4 years ago

Neither of the solutions works for me, unfortunately. I used to face this issue every now and then but since last week, it's been failing 10/10 times. Not sure how to go about this. I'll be waiting for the update. Is there an ETA for the update?

I just tried both solutions, and both of them worked. If you don't mind, please add a print(html) to _scrape_song_lyrics_from_url() after the html variable is declared and send the code of your _scrape_song_lyrics_from_url() function and the output of that print statement here or in a code storage website so I can check what happens with your client. Just make sure the output is from an attempt that failed (when search_song() returns Specified song does not have a valid URL with lyrics. Rejecting.)

As for the update, I'm not sure and it really depends on @johnwmillr. Besides, if the solutions I provided above don't actually work for everyone else, someone will have to come up with a solution first.

NIkitabala commented 4 years ago

I tried both solutions and it didn't worked for me too. Do you have other ideas?

MatthieuBonbon commented 4 years ago

This looks like to be the same issue as #139. If you'd like to know what's happened, read my comment on that issue. Otherwise here's the solution: There are two solutions. Solution 1 Seems like Genius has made a change in the div tag that holds the lyrics. Like last time, they've changed the value of the class attribute again and have set a new value. So it's possible that they'll change it again in the future. Therefore this solution might be a temporary one too just like the solution in #139. Replace the following line in api.py: from: new_div = html.find("div", class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 jvlKWy") to: new_div= html.find('div', class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 kkHBOZ") Solution 2 In this solution, there is no specific class to look for. Instead, the library searches for a div in which its class has a "LyricsRoot" in its value. This solution could be as temporary as Solution 1 since Genius can change the whole value of the class attribute. But since Genius has only changed the last bit, this might hold out more. Replace the followings line in api.py: from: olddiv = html.find("div", class="lyrics") newdiv = html.find("div", class="SongPageGrid-sc-1vi6xda-0 DGVcp LyricsRoot-sc-1ynbvzw-0 jvlKWy") if old_div: lyrics = old_div.get_text() elif new_div:

Clean the lyrics since get_text() fails to convert "</br/>"

 lyrics = str(new_div)
 lyrics = lyrics.replace('<br/>', '\n')
 lyrics = re.sub(r'(\<.*?\>)', '', lyrics)

else: return None # In case the lyrics section isn't found to: olddiv = html.find("div", class="lyrics") if old_div: lyrics = old_div.get_text() else: div = [tag for tag in html.find_all('div') for attribute, value in list(tag.attrs.items()) if attribute == 'class' and 'Lyrics__Root' in str(value)]

 if div:
     # Clean the lyrics since get_text() fails to convert "</br/>"
     lyrics = str(div[0])
     lyrics = lyrics.replace('<br/>', '\n')
     lyrics = re.sub(r'(\<.*?\>)', '', lyrics)
 else:
     return None  # In case the lyrics section isn't found

I come here because I just tested the first solution and it works for me. I have no other problems related to it. Hopefully it will last as long as possible. Thanks all.