rumanzo / bt2qbt

bt2qbt is cli tool for export from uTorrent\Bittorrent into qBittorrent
https://qbforums.shiki.hu/viewtopic.php?f=14&t=5889
GNU General Public License v3.0
255 stars 9 forks source link

Files with Accented Characters are missing #49

Closed m2840 closed 7 months ago

m2840 commented 8 months ago

Describe the bug Files with Accented Characters (portuguese) are declared missing. These characters are dropped or replaced by others. For example: 01 Canção de Engate.mp3 Renaming the files in qBittorrent solved the problem, but requires doing it manually.

To Reproduce Steps to reproduce the behavior: Use bt2qbt_v1.21_amd64.exe to export uTorrent to qBittorrent

Desktop (please complete the following information):

rumanzo commented 8 months ago

Thank you for the report. I'll check it

rumanzo commented 8 months ago

image I can't reproduce this. Can you send torrent file to my email?

m2840 commented 8 months ago

I checked the torrent I send to you with torrent-file-editor-0.3.18-x64.exe I can see the .torrent name correctly by changing the coding from UTF-8 to ISO-8859-1. But the file names contain strange characters. For example: 01 Can��o de Engate.mp3 instead of 01 Canção de Engate.mp3 Maybe it's not worth to spend your time on it. These are old torrents. God knows what happened to them during their lifetime :-)

m2840 commented 8 months ago

Just a final comment. Using torrent-file-editor-0.3.18-x64.exe in Tree view, the strings of the filenames also show correct characters if the coding is ISO-8859-1. Note: I believe these torrents were created with uTorrent version 2.2.1

rumanzo commented 8 months ago

I checked the torrent I send to you with torrent-file-editor-0.3.18-x64.exe I can see the .torrent name correctly by changing the coding from UTF-8 to ISO-8859-1. But the file names contain strange characters. For example: 01 Can��o de Engate.mp3 instead of 01 Canção de Engate.mp3 Maybe it's not worth to spend your time on it. These are old torrents. God knows what happened to them during their lifetime :-)

I usually use bencode editor. May be there some problem with character encoding, I used utf-8 in my test. I haven't got you email yet

rumanzo commented 8 months ago

I made investigation. In both cases from is how decoded manually added torrent with all files renamed manually. I had to rename them cause they have different name in windows folder and libtorrent just change accented symbols to . So actually if you append new torrent in qBittorrent, the name of file will be "01 Can de Engate.mp3"

there From is how I see files in windows folder after utorrent, and To is how bt2qbt transfer it

(diff.Changelog) (len=1 cap=1) {
 (diff.Change) {
  Type: (string) (len=6) "update",
  Path: ([]string) {
  },
  From: (string) (len=25) "01 Canзгo de Engate.mp3",
  To: (string) (len=23) "01 Can\xe7\xe3o de Engate.mp3",
  parent: (interface {}) <nil>
 }
}

As you can see, there latin1 (ISO-8859-1) symbols inside from torrent file, byte to byte. Actual file names don't have these symbols.

I tried to transform bytes to runes, and got actual file names in utf-8

(diff.Changelog) (len=1 cap=1) {
 (diff.Change) {
  Type: (string) (len=6) "update",
  Path: ([]string) {
  },
  From: (string) (len=25) "01 Canзгo de Engate.mp3",
  To: (string) (len=25) "01 Canção de Engate.mp3",
  parent: (interface {}) <nil>
 }
}

But on this step I've got 2 problems: 1.) this is still different from file names in folder 2.) I totally broke emojii support And I can't guess only on input which codepage is it.

I think I can't fix this totally

rumanzo commented 8 months ago

And by the way, I see saved files with names like 07 ...O Corpo Й Que Paga.mp3, which make me think that way how utorrent save files (their names) on windows depends on system language (Russian, in my case), because I see Й. How this file named in your system?

m2840 commented 8 months ago

In my system the file is named 07 ...O Corpo É Que Paga.mp3 My Windows 10 is English (UK), but the "Regional format" is set to Portuguese and the "Current language for non-Unicode programs" is set to English (UK). Maybe all this definitions are the cause of the situation.

rumanzo commented 8 months ago

In my system the file is named 07 ...O Corpo É Que Paga.mp3 My Windows 10 is English (UK), but the "Regional format" is set to Portuguese and the "Current language for non-Unicode programs" is set to English (UK). Maybe all this definitions are the cause of the situation.

That's prove my thought. I can't detect system regional format or language. Rename files is wrong way I just update readme file for describe this peculiarity