johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
899 stars 159 forks source link

Check for valid URL in api.py #52

Closed NickReiher closed 6 years ago

NickReiher commented 6 years ago

This change fixes the error method that arises when the URL is no longer valid (Issue #43).

I added a method _page_exists() that checks if the URL that is found actually is a valid URL and not a 404 page. It does this by looking for lyrics on the URL and if it can't be found, it returns the value false.

This method is run in both search_song() and search_artist().

Running it this way does cause a beautiful soup object to be called twice for each valid song - I'm not sure how much that affects the runtime of the program. The check could be called in the scrape_song_lyrics_from_url method, but that would make the code harder to understand and follow, in my opinion (we'd have to do some sort of if statement and it seemed sloppy / hard to read).

I also added a new error message for this case to show that the song exists but a valid URL can't be found.

Closes #43.

johnwmillr commented 6 years ago

Thanks for this PR, Nick. I agree, it'd be nice to only have to call the URL once, but putting the 404 check inside _scrape_song_lyrics_from_url() may be a little sloppy. I'll try to think of a better solution, otherwise I'll merge the PR as is.

johnwmillr commented 6 years ago

Hi Nick,

Sorry for such the long delay on this.

I tried moving the a check for the URL 404 into the _scrape_song_lyrics_from_url method. Do you mind checking if this updated code works for you? Instead of an error on the songs you identified, you should just get a verbose message saying the song is skipped.