Open adamlporter opened 1 year ago
The procredures clean_up()
and get_all_songs_from_the_album()
work. I rewrote Melanie Walsh's download_album_lyrics()
procedure to work without accessing LyricsGenius.
def download_album_lyrics(artist, album_name):
clean_songs = get_all_songs_from_album(artist, album_name)
artist = artist.replace(" ", "-")
album_name = album_name.replace(' ','-')
for song in clean_songs:
song_title = re.sub("[^\w\s]",'',song) #get rid of punctuation
song_title = song_title.replace(' ','-')
try:
url = f"https://genius.com/{artist}-{song_title}-lyrics"
response = requests.get(url)
if response.status_code == 200:
Path(f"{artist}_{album_name}").mkdir(parents=True, exist_ok=True)
html = response.text
document = BeautifulSoup(html, "html.parser")
div = document.find("div", class_=re.compile("^lyrics$|Lyrics__Root"))
try:
lyrics = div.get_text("\n")
filen = f"{artist}-{album_name}/{song_title}.txt"
with open(filen, 'w') as file:
file.write(lyrics)
print(f"saving {filen}")
except AttributeError:
print(f"No lyrics found for {song_title}")
else:
print(f"problem getting lyrics for {artist} - {song_title}")
print(f"error code was {response.status_code}")
except FileNotFoundError:
print(f"{url} is not found")
I have tested this and is works -- sort of. I was able to download the lyrics for three albums, then the requests.get(url)
started throwing FileNotFoundErrors
.
I suspect genius.com is tracking IP addresses and starts blacklisting them if they make too many requests (either total or in a specific period of time). Interestingly, even after the download_album_lyrics()
stops working, the get_all_songs_from_album()
continues to work.
It might be possible to replace genius.com with lyrics.com. The latter site has an easier HTML structure that makes it possible to extract lyrics text without using a regular expression. (This may be similar to what genius.com used when Melanie first wrote the textbook.)
response = requests.get("https://www.lyrics.com/lyric/8237688")
html = response.text
document = BeautifulSoup(html, "html.parser")
print(document.find('pre').text)
When I tried to work through this page, I got an error when trying to execute
The error is
Apparently, genius.com has changed one (or more) of their settings, so that LyricsGenius no longer works. See https://stackoverflow.com/questions/72078610/getting-lyrics-from-genius-api-gives-error https://github.com/johnwmillr/LyricsGenius/issues/190 https://github.com/johnwmillr/LyricsGenius/issues/220 The conclusion from these is (unhappily) not to use LyricsGenius.