johnwmillr / LyricsGenius

Download song lyrics and metadata from Genius.com 🎶🎤
http://www.johnwmillr.com/scraping-genius-lyrics/
MIT License
898 stars 159 forks source link

add ensure_ascii=False as an option for json.dump #112

Closed floese closed 3 years ago

floese commented 4 years ago

I had to add this parameter in the songs.py file to get my lyrics without unicode representations of certain characters (e.g. french letters with accents).

johnwmillr commented 4 years ago

Hi @floese,

Sorry for taking so long to get back to you. Can you check whether you still have the same issue with the latest version of the package? Or can you give me a list of the French songs that you ran into issues with when saving to json?

Thanks, John

DarrelDonald commented 4 years ago

I'm not sure if this is the same issue or not, but I'm getting a unicode error with "'Till I Collapse" by Eminem.

python3 -m lyricsgenius song "'Till I Collapse" "Eminem" --save
Searching for "'Till I Collapse" by Eminem...
Done.
Traceback (most recent call last):
  File "\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "\lib\site-packages\lyricsgenius\__main__.py", line 56, in <module>
    main()
  File "\lib\site-packages\lyricsgenius\__main__.py", line 43, in main
    print("Saving lyrics to '{s}'...".format(s=song.title))
  File "\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 18: character maps to <undefined>

I'm not sure how to properly fix the issue, but I got it to work by adding a couple try except blocks to __main__.py and api.py

I put print("Saving lyrics to '{s}'...".format(s=song.title)) from line 43 and main() from line 56 in try blocks in __main__.py

I put print('Song {n}: "{t}"'.format(n=artist.num_songs, t=song.title)) from line 361 and print('"{s}" is not valid. Skipping.'.format(s=s)) from line 346 in api.py in a try block as well.

It looks like there's a problem with printing a unicode character to the console

UPDATE: I ran into the same issue while I was working on my project. I found a solution and added it into your code. I sent you a pull request with the fix.