johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
892 stars 158 forks source link

Use regular expression for new_div's class name #154

Closed eeishaan closed 3 years ago

eeishaan commented 4 years ago

I observe that many times the class name of a new_div is not the same for me as hard-coded in the file. Using a regular expression instead of a fixed string to search for the lyrics' div solves the problem.

I have attached the html source that I get for my query song = genius.search_song('Would You Be So Kind', 'dodie'). Please change the extension to .html. I've uploaded as .txt because github doesn't allow me to upload .html files: source.txt

eeishaan commented 4 years ago

Woops! Just saw the merge request #153 that handles this efficiently. Closing this.

allerter commented 4 years ago

I didn't know you could send a Regex as the _class__ parameter. So with that in mind, I think with a small modification:

new_div = html.find("div", class_=re.compile("Lyrics__Root"))

Not only is it more concise but also more efficient (checked with timeit) than the one in #153. So if you could please make that modification, I think this is the way to go to get the div.