johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
904 stars 159 forks source link

Lyrics appear to contain a bit of garbage data? #237

Open Gazoo101 opened 2 years ago

Gazoo101 commented 2 years ago

Returned Lyrics contain some garbage data, I'd assume due to a change in formatting on https://genius.com 's webpage?

All lyrics (or at least the 7 I tested) appear to lead with the following:

"\ Lyrics", e.g. in the case of FreeBird, it'd be "FreeBird Lyrics"

and end with "\Embed" or "Embed" at the end.

I'd say these pieces aren't supposed to be part of the lyrics, yes?

Version info

Acervans commented 2 years ago

I've found the cause, in the lyrics() method in genius.py, the div that is searched is the one with class_=re.compile("^lyrics$|Lyrics__Root" however this also returns the number of "Pyongs" and the Embed button from the Lyrics_Footer div, the text content of which are included in Lyrics_Root.

EDIT: I've seen this solved in https://github.com/johnwmillr/LyricsGenius/pull/215#issuecomment-1083670536

roaldandresen commented 1 year ago

I've found the cause, in the lyrics() method in genius.py, the div that is searched is the one with class_=re.compile("^lyrics$|Lyrics__Root" however this also returns the number of "Pyongs" and the Embed button from the Lyrics_Footer div, the text content of which are included in Lyrics_Root.

EDIT: I've seen this solved in https://github.com/johnwmillr/LyricsGenius/pull/215#issuecomment-1083670536

Hi. Your link seem to point to nowhere. Do you have the fix for this bug? I am bit too fresh with Python to start fiddling with the code myself.

allerter commented 1 year ago

The PR is available at https://github.com/johnwmillr/LyricsGenius/pull/215 If you can't or don't know how to merge this PR with your own fork. Just add this

I've found the cause, in the lyrics() method in genius.py, the div that is searched is the one with class_=re.compile("^lyrics$|Lyrics__Root" however this also returns the number of "Pyongs" and the Embed button from the Lyrics_Footer div, the text content of which are included in Lyrics_Root. EDIT: I've seen this solved in https://github.com/johnwmillr/LyricsGenius/pull/215#issuecomment-1083670536

Hi. Your link seem to point to nowhere. Do you have the fix for this bug? I am bit too fresh with Python to start fiddling with the code myself.

The PR is available at https://github.com/johnwmillr/LyricsGenius/pull/215 Until that PR is merged and the library updated, you could fork the repository and merge this PR with your own fork.

roaldandresen commented 1 year ago

Thank you!