Closed DarrelDonald closed 3 years ago
Hi Darrel,
Can you post some examples of songs that were giving you errors before this change?
Thanks, John
"'Till I Collapse" by Eminem was the only one I encountered before modifying the code. I was trying to download all of Eminem's songs. I think there were a lot because I had it print a message in the console every time it would happen at first, but I couldn't see exactly which songs were doing it.
I can't recreate this issue with the latest version of the package (1.8.2). Can you test your search with the latest version of the package? Or provide example code that produces the error?
John
python3 -m lyricsgenius song "'Till I Collapse" "Eminem" --save
Searching for "'Till I Collapse" by Eminem...
Done.
Traceback (most recent call last):
File "\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "\lib\site-packages\lyricsgenius\__main__.py", line 56, in <module>
main()
File "\lib\site-packages\lyricsgenius\__main__.py", line 43, in main
print("Saving lyrics to '{s}'...".format(s=song.title))
File "\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 18: character maps to <undefined>
Hi @DarrelDonald, sorry for the delay. Are you still running into this issue? What version of Python and OS are you using?
I haven't used it since then. I was using Python 3.5 and my operating system was Windows 10.
@johnwmillr, this happened to me too recently. Try this:
song = genius.search_song('60 days sober and cool', artist='Noah Cyrus')
song.to_text('song.txt')
The problem is with the \u2005
character which is one of the space characters. The difference with Darrel's issue is that there was a Unicode character in the song's title, unlike mine which was in the lyrics. I guess we should encode all text to 'utf8' when printing/saving if this issue doesn't have to do with my Python environment.
There is already an issue with this problem at #138
@Allerter, does the snippet you shared produce an error for you? When running in my environment, the song saves without issue.
@johnwmillr, it does. I guess it might have to do with my environment. But either way, the problem in this issue is probably an actual problem. It's because the Windows console uses a different charset but Python 3 deals in Unicode. So when trying to print a string that has Unicode characters, it results in an error in Python 3.5 and lower. This was solved in 3.6 since Python bypasses console I/O to support Unicode. If someone with 3.5 and lower wanted to be able to print Unicode they would have to do this (from a solution on SO):
chcp 65001
set PYTHONIOENCODING=utf-8
Thanks for the explanation, @Allerter. Do you know of a package-wide approach that would address the <= 3.5 issue that would be more robust than adding .encode('utf8'))
wherever we print or save text?
@johnwmillr, unfortunately, I don't know of a package-wide way to achieve this. We could probably set the PYTHONIOENCODING
environment variable to utf-8
. That would solve the issue of printing Unicode characters. If we only set it once when the Genius
class is instantiated, we would have to rely on the user not changing this later on. So I don't think that's a good idea.
Looking at this question on StackOverflow, I think this might be the way to go:
encoding='utf8'
whenever we save a text file. The reason why saving the song worked with your environment is because Python uses locale.getpreferredencoding()
to infer the file's encoding and set the encoding
parameter and yours is probably set to utf8
, but that's not the case for me. So I think it's best to explicitly set the encoding ourselves.utf8
and decoding by the user's stdout's encoding which we could make it a utility function to call whenever needed:
def print_unicode(s):
print(s.encode('utf-8').decode(sys.stdout.encoding, errors='replace'))
The good thing about this function is that if the user's console can print Unicode characters, it will print everything, and when it can't, it will display the text with the unicode character like \u2005
. For example:
String >>> There is a \u2005 here
If sys.stdout.encoding is utf8 or another that can handle the character >>> There is a β
here
If sys.stdout.encoding can't handle the character >>> There is a \u2005 here
Although this looks good to me, there might be a better way to handle this that I'm not aware of.
Seems like unicodedata.normalize
can solve the issue of printing Unicode to output in Python <=3.5.
:sweat_smile: I think I forgot to squash the merge. As unicodedata.normalize
turned out not to work, I added the safe_unicode
function in utils.py
and used it wherever the package prints something that might lead to the UnicodeEncodeError
. What do you think about this solution, @johnwmillr?
Also, all the open()
s that save lyrics, will now have encoding='utf8'
which will solve saving lyrics that contatin Unicode characters (#138). This kinda removes the need for the binary_encoding
parameter if it was only meant for the Unicode issue.
Another solution would be to use the logging
module from Python's standard libraries.
I resolved the conflicts but there was a green commit merge and since I wasn't sure if it would update the package or the PR, I didn't submit it.
The PR looks good! Thank you @DarrelDonald and @Allerter for your work on this. Merging now.
Whenever there were unicode characters that needed to be printed, an error would be produced. I encoded the print statements and it resolved the issue.