johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
892 stars 158 forks source link

Giving metadata instead of a lyrics #151

Closed NIkitabala closed 4 years ago

NIkitabala commented 4 years ago

Describe the bug Script gives HTML data in JSON file instead of actual lyrics.

Expected behavior Lyrics, not HTML data shoud be saved in JSON file.

Here's code that I use:


import lyricsgenius
genius = lyricsgenius.Genius("#my_token")
artist = genius.search_artist("Пошлая Молли", max_songs=5, sort="title")
genius.remove_section_headers = True
print(artist.songs)
artist.save_lyrics()

When I tried to print lyrics of one song like:

song = genius.search_song("Show Must Go On", artist.name)
print(song.lyrics)

It gives this error: 'Nonetype' object has no attribute 'lyrics'.

allerter commented 4 years ago

How save_lyrics() works

When you use bare save_lyrics(), the Artist object is saved in a JSON file that contains the artist's information beside the songs. To access the lyrics you would have to do this:

import json
with open('file.json', 'r') as f:
    data = json.load(f)

for song in data['songs']:
    print(song.lyrics)

Or you could do artist.save_lyrics(extension='txt') to only have the lyrics saved.

song.lyrics fails

Now on to why song.lyrics returns an error. Sometimes the library can't find the song and returns None as a result. And a Nonetype object has no attributes. If you want to see why the search failed, set genius.verbose = True. If you do that and get this:

Specified song does not contain lyrics. Rejecting.

It's because the library needs to be updated (see #148 for more info). I suggest looping over the search till you get the lyrics:

while True:
    song = genius.search_song("Show Must Go On", artist.name)
    if song:
        break

Keep in mind that because of the bug I explained above, artist.save_lyrics() might save songs with empty lyrics for you. You could either wait for an update or fix the bug manually using the solution in #148.

NIkitabala commented 4 years ago

I tried your solution, it works with English songs, but not with Russian, Seems like problem in encoding, but it is in UTF8, which supports Cyrillic symbols.

My error looks like this:

Traceback (most recent call last):
  File "B:\Python\LyricsGenius-master\lyrics4.py", line 5, in <module>
    artist.save_lyrics(extension='txt')
  File "B:\Python\LyricsGenius-master\lyricsgenius\artist.py", line 169, in save_lyrics
    self.to_text(filename, binary_encoding=binary_encoding)
  File "B:\Python\LyricsGenius-master\lyricsgenius\artist.py", line 132, in to_text
    ff.write(data)
  File "C:\Users\kostr\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to <undefined>

Here's line 132:

      # Save song lyrics to a text file
        filename = sanitize_filename(filename) if sanitize else filename
        with open(filename, 'wb' if binary_encoding else 'w') as ff:
            if binary_encoding:
                data = data.encode('utf8')
            ff.write(data)
        return None

And line 169:

# Save the lyrics to a file
        if extension == 'json':
            self.to_json(filename)
        else:
            self.to_text(filename, binary_encoding=binary_encoding)
allerter commented 4 years ago

In this case, yes, the problem is with the encoding. If you set binary_encoding=True in artist.save_lyrics() parameters, it'll save the lyrics successfully, because the lyrics will get encoded in utf8. And when you want to read in the lyrics later in Python, you'll have to open the text file with open('lyrics.txt', 'rb') and .decode() it.

Another thing: if you want the headers removed, put genius.remove_section_headers = True before search_artist(), because that attribute is only used when the lyrics are retrieved and has no effect after that.

NIkitabala commented 4 years ago

It works now with your solution, but still has problem like in #148. I guess, we should wait for fix.