stevenewbs / gmpydl

Google Music Python Downloader
MIT License
22 stars 8 forks source link

German special letters aren't supported #12

Closed silelmot closed 5 years ago

silelmot commented 6 years ago

hey there, in germany as in many other european countries, we have special letters like Ä Ö Ü and ß, they are not saved in the titles of the songs, instead i have weird signs. the names of the folders on the other hand are ok.

stevenewbs commented 6 years ago

It's been a while but I will fire this up and take a look. I know gmusicapi likes to pass things around in Unicode so in theory it should work, especially if the artist/album path works.

stevenewbs commented 6 years ago

As far as I can see from my testing, these characters should be handled properly. According to the API docs, a unicode-type string is passed back as the filename when download_song is called. http://unofficial-google-music-api.readthedocs.io/en/latest/reference/musicmanager.html#gmusicapi.clients.Musicmanager.download_song This unicode type is passed to the os.path.join to get the full file path which returns unicode type.

Could this be due to the tool you are using to view the directory or some other external tool?

You could always try the following in a terminal:

import os unicode = u'Ä Ö Ü and ß' path = "/home//testalbum joined = os.path.join(path, unicode) print(type(joined)) print(joined)= with open(joined, 'wb') as f: f.write(b'0x01')

See what file name it makes ...

silelmot commented 6 years ago

when i try your testscript i get

SyntaxError: Non-ASCII character '\xc3' in file test.py on line 2,

i ran gmpydl on another pc, and there is the same problem.

error

stevenewbs commented 6 years ago

Stick these two lines at the top of the test script #!/usr/bin/env python # -*- coding: utf-8 -*-

Or run them from an interactive python prompt

Also, you could try manually naming the file and check that the file manager can display the character its struggling with.

silelmot commented 6 years ago

No problem here with manual renaming. the script still dont work, and i have no idea of python. :(

  File "test.py", line 9
    print(joined)=
                 ^
SyntaxError: invalid synta
stevenewbs commented 6 years ago

Take the = off the end

silelmot commented 6 years ago
File "test.py", line 11
    f.write(b'0x01')
    ^
IndentationError: expected an indented block
stevenewbs commented 6 years ago

yep - if you're not running it interactively then you will need to indent the f.write line with a tab or 4 spaces

silelmot commented 6 years ago

ok. i dont know what this shall show me. the last command still dont work, even interactively, but i got printed Ä Ö and Ü the correct way in python itself.

stevenewbs commented 6 years ago

does it create a file in home/testalbum with the correct characters?

silelmot commented 6 years ago

no, i think because f.write(b'0x01') still causing an error.

stevenewbs commented 6 years ago

what is the error? Indentation?

silelmot commented 6 years ago

yes, still the same, and i tried it in the python-console now, but it gave me the same error as before

stevenewbs commented 6 years ago

if its in the console, then you dont need to add indentation, it should do it for you

silelmot commented 6 years ago

i just typed in

import os
unicode = u'Ä Ö Ü and ß'
path = "/home/pc1/testalbum"
joined = os.path.join(path, unicode)
print(type(joined))
print(joined)
with open(joined, 'wb') as f:
f.write(b'0x01')

and got the error

    f.write(b'0x01')
    ^
IndentationError: expected an indented block
stevenewbs commented 6 years ago

Ok if you are putting this into a file (i.e not interactively), then it should look like this https://gist.github.com/stevenewbs/7532b02107d54e5613e02d1422590395

stevenewbs commented 6 years ago

for interactive, just open a terminal, type "python" and then enter each line one by one

silelmot commented 6 years ago

ok, thanks for this simple script. now it writes the file and this is ok. every letter is right.

stevenewbs commented 6 years ago

OK so now we know in principle that we can write unicode characters to filenames. Do you know what the character is missing in the image above? I cant quite see in the image but the code looks like 009F (or 9F00) which is either > https://www.fileformat.info/info/unicode/char/009f/index.htm or https://www.compart.com/en/unicode/U+9F00

The next thing to try is to take that character and add it to the unicode string in the script to test writing it.

stevenewbs commented 6 years ago

After a bit of googling it looks like the correct character would be "ß" which is in the example code. You could try adding #!/usr/bin/env python # -*- coding: utf-8 -*- to the top of gmpydl.py to see if that helps. If not, it might be the way the name of the song is encoded on gpm that differs slightly so that the python can't decode it nicely.... I'm hugely clutching at straws here as you might guess

silelmot commented 6 years ago

ok, no that didnt help. and i need to enter "!/usr/bin/env python2" with just python it isnt working at all. but also with python2 i have wrong characters for Ä Ü and ß :(

maybe google and its way to transcode this is the problem? i wil try the original google manager to see how this will be handled there

edit: ok. google music manager does well and saves also the ä (just a quick test)...

stevenewbs commented 6 years ago

ok I have added a few more unicode steps to see if this might help https://gist.github.com/stevenewbs/7532b02107d54e5613e02d1422590395

Give this a try and lets see what happens. Beyond this, moving to python3 might solve all these issues - but thats another story

silelmot commented 6 years ago

the testfile works great, but after adding

# -*- coding: utf-8 -*-
enc = sys.getfilesystemencoding()
print(enc)

to gmpydl the output is UTF8, but the files are again wrong. do you have any titles with these characters yourself, and does this only didnt work here at my pcs?

stevenewbs commented 6 years ago

When you say the test file works great - you mean it names the file correctly. But adding those lines above breaks it again?

silelmot commented 6 years ago

i just startet test.py and it writes the file "Ä Ö Ü and ß" in the testalbum. this works. but i then added those lines to gmpydl.py and hoped this will now save the musik with the right characters too, but it didnt

stevenewbs commented 6 years ago

ah right so the test file now works correctly. That is great news! Unfortunately there might be a bit more work than just adding those lines to gmpydl itself, but in theory it should be fixable. Glad we managed to find a solution.

Once I have made some updates, I will let you know and you can be the canary for the next version

silelmot commented 6 years ago

i dont know if this is a bit of misunderstanding, but also the first test.py from you worked.

stevenewbs commented 6 years ago

Right so both versions of the test script worked just fine and produced the correct characters?

silelmot commented 6 years ago

Right!

stevenewbs commented 6 years ago

Ok give this branch a go - I made a few changes to specify unicode a bit more for some strings https://github.com/stevenewbs/gmpydl/blob/Unicode-fix/gmpydl.py

silelmot commented 6 years ago

Thanks for your time, but it still doesn't work :( error

silelmot commented 6 years ago

i've just seen, that the filenames in the logfile are correct.

edit: it seems, like there is smth wrong like the filename is taken in line 202 filename, audio = api.download_song(song['id']) if i have a "print title" in the downlaod_song-def it is shown the correct way in terminal, a print title or print filename after 202 it is printed with the wrong characters.

silelmot commented 5 years ago

after adding filename = "%s/%02d - %s.mp3" % (path, song['track_number'], song['title']) after the mentioned part, i get my characteres, even special ones from edward griegs songs like Ânne.

stevenewbs commented 5 years ago

That is an epic spot - I was using the filename provided by GPM and not specifying unicode. Try the Unicode-fix branch of the code now and hopefully it should be fixed

silelmot commented 5 years ago

I haven't tested it yet, but i think it maybe a problem now, if songs have slashes and backslashes in their names.

silelmot commented 5 years ago

Ok, tested it. like i thought there are problems with / and \ and other characters (if you use windows) a fix would be to use import re and filename = re.sub(r'[\\/*"<>\|%\^&]', '_', filename) after the changed code. i also deleted the "path" in filename. it gave me a strange file with the whole path in the filename