Issues on adding podcasts with non-Latin characters

peremen commented 9 years ago

gPodder version: 4.6.0

I tried to add a podcast in Korean, and the podcast entry is spinning after entering URL. Closing and re-opening gPodder will not show the podcast on the list, while "Filter episodes" screen will reveal the episodes.

Possibly related logs:

2015-07-14 15:19:19,370 [gpodder.model] INFO: Cannot rename old download folder: /home/nemo/.local/share/harbour-org.gpodder.sailfish/http___xsfm.co.kr_xml_podera.xml
Traceback (most recent call last):
  File "/usr/share/harbour-org.gpodder.sailfish/gpodder/model.py", line 889, in get_save_dir
    os.rename(old_folder, new_folder)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 53-55: ordinal not in range(128)

URL to test: http://www.xsfm.co.kr/xml/idwk.xml

Cigydd commented 6 years ago

I don't have a solution ready for you but I came to a similar problem and the issue is basically the same, now only with the Czech language, which is Latin alphabet based but has some diacritical marks.

I discovered that Python 3 recognizes the file system encoding as 'ascii', which is the true case of our problems. Every extended or non-Latin character in the podcast's name then blocks it from being fetched.

Thought about a workaround with altering the /etc/fstab file giving the root and /home btrfs volumes a "utf8" option… But I still don't know if it's a valid option for btrfs and I don't want to risk rendering my device unbootable. Found a Wiki page where there is no codepage-related option listed. It's a modern file system so it should support Unicode by default. Or, probably btrfs doesn't care about encoding at all and stores the file names as given byte by byte. So why does Python3 on Sailfish say the filesystem encoding is 'ascii'? I think it's a bug in Python3 for Sailfish.

Wait! It isn't a bug in Python! My locale was set up wrong! Installed the unofficial Czech localization and my locale got fixed and GPodder now fetches and downloads podcasts with diacritics in their name. So I would recommend you, @peremen, to fix your locale by installing some locale you do understand. I think that an unofficial Korean translation is out there somewhere in the open repos.

sfbg commented 5 years ago

@Cigydd any idea what exactly was set up wrong in your locale? I am currently testing an unofficial Bulgarian translation, built as an RPM and get the exact same issue with podcast names in Bulgarian.

If I switch to any other UI language the problem does not persist...

Cigydd commented 5 years ago

@sfbg It’s been a long time but I think the issue was that some or all of the LC_*, LANG, and LANGUAGE environment variables were set to "C", i. e., the default locale. Python then overtook this locale as the 'ascii' filesystem encoding. I was trying to set the locale somewhere in the system wide config files, such as /etc/profile but that didn’t help. Maybe the Bulgarian unofficial translation doesn’t set the locale variables correctly (to bg_BG.UTF-8). You should probably contact its authors.

sfbg commented 5 years ago

Thank you. It is a strange thing. The issue appears if I install a new BG package while using the Bulgarian translation. If I install it while using the English translation and then switch to the Bulgarian one, it disappears.

Cigydd commented 5 years ago

Strange, indeed.

sfbg commented 5 years ago

OK, actually it does not matter where and how the language RPM is being installed. If I install it twice back-to-back the problem does not manifest.

Cigydd commented 5 years ago

Very interesting! It seems that the language pack RPM misbehaves somehow on the first install. Maybe it only installs the Bulgarian locale, forgets to set it as the current one, and then forgets to remove it during uninstall. Then during the second install, the locale is already installed, so the package sets it as the default and keeps it installed. However, during the second uninstall, it removes the locale from the system and cleans everything up, making the third installation work as the first. That’s my theory … well … speculation, better to say, how the bug could work. But I didn’t see even a single piece of the code. Try to contact the author(s).

Cigydd commented 5 years ago

For debug purposes, try this:

Install the BG locale.
Open the Terminal app.
Type in these commands:
```
echo $LC_ALL
echo $LC_MESSAGES
echo $LANG
echo $LANGUAGE
```
After each line, you should get the current locale for the specified variable (after the "$" sign). That will tell you if the given locale type supports the Bulgarian language and Unicode, or BG without Unicode, or neither, or even something else.
Close the terminal app and install the package again.
Repeat steps 1.–3.

This will get you an overview if the problem lies really in the current locale.

sfbg commented 5 years ago

Interestingly, they are identical, when the issue is manifested and when it is not. All variables are empty, except $LANG which has the value "bg_BG.utf8" in both cases.

Cigydd commented 5 years ago

Wow, this is confusing. But at least, you see something is wrong with the locale. The empty locale variables mean that the system defaults to the "C" locale (that’s English in fact, with the ASCII character set). But it’s weird that the locale works after one installation and doesn’t work after the other.

There’s another possible test:

Open the terminal
Run Python:
```
python
```
In the Python interpreter, type these commands:
```
import sys
print sys.getdefaultencoding()
print sys.getfilesystemencoding()
exit()
```
It should give you “utf-8” after both of the print statements. Then repeat after the reinstall of the localization package. BTW. are you the actual author of the translation? ☺

sfbg commented 5 years ago

With gPodder working fine I get: ascii for the first line and UTF-8 for the second

When it is broken I get ascii for the first line and ANSI X3.4-1968 for the second

So, there is a difference...

I am a contributor to the translation. A friend of mine builds the RPM, based on this https://github.com/martonmiklos/unofficial-jolla-translation

Cigydd commented 5 years ago

OK, so why not let the authors of the scripts know? Maybe they should be notified that somethings’s wrong with their scripts. You or your friend can open a new issue here and add a link to our discussion. At least, we tracked down the actual difference a bit. I’m not specialized in Jolla development so I don’t think I could help you any more.

sfbg commented 5 years ago

Yes, I will open an issue there. Thank you very much for the support. Now it is clear where the issue lies and it is documented here, so it might help other people too. I think this issue can be closed.

peremen commented 5 years ago

So I would recommend you, @peremen, to fix your locale by installing some locale you do understand. I think that an unofficial Korean translation is out there somewhere in the open repos.

Actually I am the author of that package, and I am likely setting the locale exactly the same was what have been discussed here for Bulgarian language. @sfbg, could you please post the content of /usr/share/jolla-supported-languages/bg.conf? If it follows the same format as other locale definitions, then the problem is somewhere else.

Also, for what is available for my environment:

import sys
print sys.getdefaultencoding()
'utf-8'
print sys.getfilesystemencoding()
'ascii'

I agree with moving the problem elsewhere, as now we know that the root cause is not here.

sfbg commented 5 years ago

The contents of the file:

Name=Български
LocaleCode=bg_BG.utf8
Region=България
RegionLabel=Регион: %1

Keeper-of-the-Keys commented 3 weeks ago

Closing - in my subscription list I have at least one non-latin named podcast so I assume it was fixed at some point.

If not please reopen or create a new issue.

gpodder / gpodder-sailfish

Issues on adding podcasts with non-Latin characters #40