johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

"Sometimes the lyrics section isn't found" #139

Closed ScientiaEtVeritas closed 4 years ago

ScientiaEtVeritas commented 4 years ago

It seems to be a known bug, at least "Sometimes the lyrics section isn't found" suggests so. But it's a quite annoying one, as it's like 50/50 for me. It may depend on the session created. Once it's working it's working for all requests. Once it's not working it keeps not working.

       div = html.find("div", class_="lyrics")
        if not div:
            return None # Sometimes the lyrics section isn't found
allerter commented 4 years ago

Read the last paragraph for the solution. I think it's because sometimes BeautifulSoup fetches the "new song page", and in Genus's new theme, the div class for lyrics is called SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 jvlKWy. If you change the code you mentioned to the one below it works when the problem is with the div class:

        if not div:
            div = html.find("div", class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 jvlKWy")

But this workaround has an issue itself: the get_text() method fails to add the new lines probably because in the new theme the newlines are written as <br/> in the HTML file. So you have to clean the tags yourself: All you have to do is to replace lines 169 to 174 with the code below:

old_div = html.find("div", class_="lyrics")
new_div = html.find("div", class_="SongPageGrid-sc-1vi6xda-0 DGVcp Lyrics__Root-sc-1ynbvzw-0 jvlKWy")
if old_div:
    lyrics = old_div.get_text()
elif new_div:
    lyrics = str(new_div)
    lyrics = lyrics.replace('<br/>', '\n')
    lyrics = re.sub(r'(\<.*?\>)', '', lyrics)
else:
    return None
allerter commented 4 years ago

The issue has been resolved in version 2.0.0