mediathekview / MediathekView

Das Programm MediathekView durchsucht die Online-Mediatheken verschiedener Sender
https://mediathekview.de
GNU General Public License v3.0
870 stars 96 forks source link

SRT saved in wrong codepage #210

Closed mogol07 closed 7 years ago

mogol07 commented 7 years ago

I am using Windows 10 PRO with default Russian locale and MediathekView 13.0.1. When I saved subtitle, ttml has UTF-8 codepage, SRT file has 8bit CP1251 with completely broken text like

so, bitte sch?n, prost!

All symbols which doesn't exist in CP1251, is replaced by question marks. I suspect, it happens, because MediathekView assumes that system codepage is always CP1252, and tries to write in this codepage.

It's not critical bug, because I have a ttml file in UTF-8, which has correct text and could be converted to SRT with external utility. But it would be great, if it could be fixed in future release.

derreisende77 commented 7 years ago

it will be fixed in the next update. For the time being you should modify your start script to contain the following setting for java: -Dfile.encoding=UTF-8

Would be nice to hear if it already fixes your problem.

mogol07 commented 7 years ago

I run MediathekView.exe since I am working with Windows. But I could find a script only for Linux system.

Nicklas2751 commented 7 years ago

@mogol07 Then right click on the Exe open the Settings and add this to the start arguments. ;)

mogol07 commented 7 years ago

I try to run MediathekView.exe -Dfile.encoding=UTF-8 Nothing is changed. SRT files is still wrote in 8bit CP1251

18 00:01:31,900 --> 00:01:34,100 K?nnten Sie auch etwas

zxsd commented 7 years ago

Hhhmmm. Could something else be going on with your PC, @mogol07?

{Edit} BTW. I don't believe parameters can be passed to the executables generated by launch4j (https://github.com/mediathekview/MediathekView/issues/134), @mogol07, so your previous attempt might not have been able to succeed.

In order to answer the question from @derreisende77 (above), prior to taking any other actions please attempt the following commands, either in a "Command Prompt" window, or by creating a shortcut on your PC's desktop for testing purposes, and subsequent use ... if it works {g}. (Modify the entries as necessary.) Note: If you create a shortcut, ensure the working directory (the "Start in" field) points to where your MV installation resides (e.g., "C:\Temp\MediathekView").

cd \Temp\Mediathekview
"C:\Program Files\Java\jre1.8.0_121\bin\java.exe" -Xmx1256M -Xms256M -Dfile.encoding=UTF-8 -jar "C:\Temp\MediathekView\mediathekview.jar"
pause

@derreisende77. Mir bringt dieser Schalter nichts ... außer DLs die bei diesem Schalter betätigt wurden, mit Codepage-1252 Aktivierung in Windows keine 'aufgebesserten' Zeichen dann gezeigt werden. In derartigen UT-Dateien sind auch keine Umlaute mehr drinnen ... z.B., anstatt "überraschen" ist "uberraschen" immer zu sehen (Notepad, "type.exe'")... an und für sich eine Überraschung {g}.

codepage_berrasch

Anmerkung: Die Schalter -Dfile.encoding=UTF8 und -Dfile.encoding=windows-1252 nutzen hier auch nichts.

{EndEdit}

On this PC the default US "locale" (accessed via "All Settings" -> "Time & Language" -> "Region & Language") and MediathekView 13.0.0 are active:

codepage_win10_settings locale

I have no character-display problems for either file-type, as shown below using both Notepad (SRT) and Notepad++ (TTML). Note also the active code page (red arrow) ... a widely used codepage that is more primitive than mine (437) would be hard to find {g}.

codepage_win10

If you are processing the file using Windows "Command Prompt" native commands or other non-GUI utilities, then you can likely correct display problems by manually changing the codepage prior to executing your program. The picture below shows excerpts of "type"ing the file in a "Command Prompt" window (simply copied into Notepad for ease of display), with the respective codepage active (437 on the left, 1252 on the right). Changing the codepage, however, generally has no bearing on the contents of the file itself.

codepage_setmanuallyto1252

mogol07 commented 7 years ago

@zxsd thanks! After I have used "C:\Program Files\Java\jre1.8.0_121\bin\java.exe" -Xmx1256M -Xms256M -Dfile.encoding=UTF-8 -jar "mediathekview.jar" I got an correct SRT file in UTF-8, and also I could save to a file with umlaut in the name. Before it, ttml file was in UTF-8, srt file was in CP1251 and I couldn't use an German specific letter in file name. Just for information, my language settings is following: location - Germany, but interface language is Russian, and default input language is US English. It could be little bit confusing. image

derreisende77 commented 7 years ago

Then you might want to save this trick for future MV releases. The SRT UTF fix is only for SRT. It does not impact the encoding settings for the rest of the applications. You might need to reapply this again in another release. We don't have too many user with such "strange" settings

mogol07 commented 7 years ago

@derreisende77 for me fixing issue with SRT would be enough. Folder name is not a big deal. Thanks again