Fixed issue with printing unicode

DarrelDonald commented 4 years ago

Whenever there were unicode characters that needed to be printed, an error would be produced. I encoded the print statements and it resolved the issue.

johnwmillr commented 4 years ago

Hi Darrel,

Can you post some examples of songs that were giving you errors before this change?

Thanks, John

DarrelDonald commented 4 years ago

"'Till I Collapse" by Eminem was the only one I encountered before modifying the code. I was trying to download all of Eminem's songs. I think there were a lot because I had it print a message in the console every time it would happen at first, but I couldn't see exactly which songs were doing it.

johnwmillr commented 4 years ago

I can't recreate this issue with the latest version of the package (1.8.2). Can you test your search with the latest version of the package? Or provide example code that produces the error?

John

DarrelDonald commented 4 years ago

python3 -m lyricsgenius song "'Till I Collapse" "Eminem" --save
Searching for "'Till I Collapse" by Eminem...
Done.
Traceback (most recent call last):
  File "\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "\lib\site-packages\lyricsgenius\__main__.py", line 56, in <module>
    main()
  File "\lib\site-packages\lyricsgenius\__main__.py", line 43, in main
    print("Saving lyrics to '{s}'...".format(s=song.title))
  File "\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 18: character maps to <undefined>

johnwmillr commented 4 years ago

Hi @DarrelDonald, sorry for the delay. Are you still running into this issue? What version of Python and OS are you using?

DarrelDonald commented 4 years ago

I haven't used it since then. I was using Python 3.5 and my operating system was Windows 10.

allerter commented 4 years ago

@johnwmillr, this happened to me too recently. Try this:

song = genius.search_song('60 days sober and cool', artist='Noah Cyrus')
song.to_text('song.txt')

The problem is with the \u2005 character which is one of the space characters. The difference with Darrel's issue is that there was a Unicode character in the song's title, unlike mine which was in the lyrics. I guess we should encode all text to 'utf8' when printing/saving if this issue doesn't have to do with my Python environment. There is already an issue with this problem at #138

johnwmillr commented 4 years ago

@Allerter, does the snippet you shared produce an error for you? When running in my environment, the song saves without issue.

allerter commented 4 years ago

@johnwmillr, it does. I guess it might have to do with my environment. But either way, the problem in this issue is probably an actual problem. It's because the Windows console uses a different charset but Python 3 deals in Unicode. So when trying to print a string that has Unicode characters, it results in an error in Python 3.5 and lower. This was solved in 3.6 since Python bypasses console I/O to support Unicode. If someone with 3.5 and lower wanted to be able to print Unicode they would have to do this (from a solution on SO):

chcp 65001
set PYTHONIOENCODING=utf-8

johnwmillr commented 4 years ago

Thanks for the explanation, @Allerter. Do you know of a package-wide approach that would address the <= 3.5 issue that would be more robust than adding .encode('utf8')) wherever we print or save text?

allerter commented 4 years ago

@johnwmillr, unfortunately, I don't know of a package-wide way to achieve this. We could probably set the PYTHONIOENCODING environment variable to utf-8. That would solve the issue of printing Unicode characters. If we only set it once when the Genius class is instantiated, we would have to rely on the user not changing this later on. So I don't think that's a good idea. Looking at this question on StackOverflow, I think this might be the way to go:

Saving: Using encoding='utf8' whenever we save a text file. The reason why saving the song worked with your environment is because Python uses locale.getpreferredencoding() to infer the file's encoding and set the encoding parameter and yours is probably set to utf8, but that's not the case for me. So I think it's best to explicitly set the encoding ourselves.
Printing: Encoding by utf8 and decoding by the user's stdout's encoding which we could make it a utility function to call whenever needed:
```
def print_unicode(s):
print(s.encode('utf-8').decode(sys.stdout.encoding, errors='replace'))
```
The good thing about this function is that if the user's console can print Unicode characters, it will print everything, and when it can't, it will display the text with the unicode character like \u2005. For example: String >>> There is a \u2005 here If sys.stdout.encoding is utf8 or another that can handle the character >>> There is a here If sys.stdout.encoding can't handle the character >>> There is a \u2005 here Although this looks good to me, there might be a better way to handle this that I'm not aware of.

allerter commented 3 years ago

Seems like unicodedata.normalize can solve the issue of printing Unicode to output in Python <=3.5.

allerter commented 3 years ago

:sweat_smile: I think I forgot to squash the merge. As unicodedata.normalize turned out not to work, I added the safe_unicode function in utils.py and used it wherever the package prints something that might lead to the UnicodeEncodeError. What do you think about this solution, @johnwmillr? Also, all the open()s that save lyrics, will now have encoding='utf8' which will solve saving lyrics that contatin Unicode characters (#138). This kinda removes the need for the binary_encoding parameter if it was only meant for the Unicode issue.

allerter commented 3 years ago

Another solution would be to use the logging module from Python's standard libraries.

allerter commented 3 years ago

I resolved the conflicts but there was a green commit merge and since I wasn't sure if it would update the package or the PR, I didn't submit it.

johnwmillr commented 3 years ago

The PR looks good! Thank you @DarrelDonald and @Allerter for your work on this. Merging now.

johnwmillr / LyricsGenius

Fixed issue with printing unicode #126