jb2170 / better-adb-sync

Completely rewritten adbsync with --exclude
Apache License 2.0
369 stars 24 forks source link

UnicodeDecodeError (testing files included) #44

Open pureair opened 4 months ago

pureair commented 4 months ago

Environment:

Python version: 3.12.2 adb version: Android Debug Bridge version 1.0.41 / Version 35.0.1-11580240 Operating System: Windows 10

Error Description:

The adbsync command fails with a UnicodeDecodeError when encountering specific folder/file names on the Android device. The error message indicates: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 62: invalid start byte Full error log is at the end of this report.

Investigation:

I have narrowed down the problematic folder/files and included an archive (tester.tar.gz) containing their names and folder structure for further analysis. The content of the files have been emptied, only their filenames are kept.

Each of the three folders would be:

  1. successfully pushed to the phone initially (as there is no existing files on the phone)
  2. failed to be pulled from the phone
  3. failed to be pushed to the phone again (as existing files on the phone would be examined first)
  4. a) the folder with only English characters will be pulled with "--adb-encoding latin1", but the defect remains (ie. do step 1, 2, 3 to the newly pulled files and the result is the same). I think this means the filenames are perfectly utf-8 encoded so nothing changes. b) the folder with some Chinese characters will not be pulled with "--adb-encoding latin1" potentially because of bad filenames (error log below)
[INFO] SYNCING
[INFO]
[INFO] Empty delete tree
[INFO]
[INFO] Copying copy tree
[INFO] .\
[INFO] ./éè´é¸-HOYO-MiX - åç¥-éªèç群æ The Stellar Moments\
[INFO] ./éè´é¸-HOYO-MiX - åç¥-éªèç群æ The Stellar Moments/01. Bard's Adventure è¯äººçå·¥ä½.m4a
[CRITICAL] Non-zero exit code from adb pull
[CRITICAL] Exiting

they will not be pulled with "--adb-encoding gb2312" or gbk, gb18080, utf-8, utf-16, etc. because of 'utf-8' codec can't decode byte 0x0b in position xx (0xe9, 0xb8, etc.).

I actually look at the hex of the folder and files names of the folder with only English characters, there is actually no 0xa0 in either the folder name or the file names.


Full error log:

PS G:\> adbsync -n pull /sdcard/Music ./
* daemon not running; starting now at tcp:5037
* daemon started successfully
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Scripts\adbsync.exe\__main__.py", line 7, in <module>
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Lib\site-packages\BetterADBSync\__init__.py", line 374, in main
    files_tree_source = fs_source.get_files_tree(path_source, follow_links = args.copy_links)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Lib\site-packages\BetterADBSync\FileSystems\Base.py", line 45, in get_files_tree
    return self._get_files_tree(tree_path, statObject, follow_links = follow_links)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Lib\site-packages\BetterADBSync\FileSystems\Base.py", line 33, in _get_files_tree
    tree[filename] = self._get_files_tree(
                     ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Lib\site-packages\BetterADBSync\FileSystems\Base.py", line 30, in _get_files_tree
    for filename, stat_object_child, in self.lstat_in_dir(tree_path):
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Lib\site-packages\BetterADBSync\FileSystems\Android.py", line 176, in lstat_in_dir
    for line in self.adb_shell(["ls", "-la", path]):
  File "C:\Users\USERX\AppData\Local\Programs\Python\Python312\Lib\site-packages\BetterADBSync\FileSystems\Android.py", line 87, in adb_shell
    adb_line = adb_line.decode(self.adb_encoding).rstrip("\r\n")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 62: invalid start byte
LazyWizard commented 3 months ago

Encountered this error myself and decided to take a whack at it. Thanks for the test data, it makes things a million times easier!

I think with byte strings you need to use the dummy encoder unicode_escape when decoding strings containing escaped multi-byte unicode characters, but when I tried this it returned the right filenames but ADB couldn't find the files. Never mind, looks like it fixes the non-breaking space that's causing sync issues but breaks on actual multi-byte characters.

I'm guessing it only has to be decoded this way on one side (wherever it's actually broken), but I'm unfamiliar with the codebase. I'll tinker with the code a bit and see if I can get things working.

gAtrium commented 1 month ago

I have also tried to tackle this problem, there are indeed nbsp characters present in the stdout. What I have done to fix the nbsp problem is to get the output as raw bytes, iterate over each byte and fix any nbsp charcodes that are not part of a multi-byte utf-8 character. This is sort of google's issue as well for not parsing wildcard (*) characters to match these problematic filenames during pull, though I could be wrong.

pureair commented 1 month ago

OK, it seems replacing non-breaking space 0xa0 with regular space solves the problem. I wrote a powershell script to batch rename files:

# Get all files in the current directory and sub-directory
$files = Get-ChildItem -File -Recurse

foreach ($file in $files) {
    # Create the new file name by replacing spaces with "$~#"
    $newFileName = $file.Name -replace [char]0x00A0 , '_'

    # Define the full path for the new file name
    $newFilePath = Join-Path -Path $file.DirectoryName -ChildPath $newFileName

    # Rename the file if the new name is different
    if ($file.FullName -ne $newFilePath) {
    echo Renaming "$file.FullName"
        Rename-Item -Path $file.FullName -NewName $newFilePath
    }
}

pause

Not sure how to run bash terminal in android with file access to /sdcard/, so not able to write a bash equivalent and test on android.