lisamelton / more-video-transcoding

More tools to transcode videos.
MIT License
45 stars 1 forks source link

Invalid Byte Sequence #6

Closed weaverm closed 1 year ago

weaverm commented 1 year ago

I'm running this command: ruby /pathto/two-pass-transcode.rb /pathto/Westworld/Season\ 02/Westworld\ s02e03\ Virtù\ e\ Fortuna.mkv

and getting this output: /pathto/two-pass-transcode.rb: invalid byte sequence in UTF-8

If I change the source file name to remove the ù and replace it with a u, then things work as expected. transcode-video and other-transcode handle the filename without error.

I'm on macOS Monterey 12.6.5 on Intel if that matters.

BTW, it also fails for episode 7 as well: Westworld\ s02e07\ Les\ Écorchés.mkv

weaverm@iMac Super Scratch % ruby --version
ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.x86_64-darwin21]
skj-dev commented 1 year ago

I'm on Ventura, which looks to have the same (old) version of Ruby on the system. That's kind of hilarious in its own way.

❯ /usr/bin/ruby --version                                                                                  
ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22]

Ruby doesn't throw an error for me, however.

❯ /usr/bin/ruby /usr/local/bin/two-pass-transcode.rb /Volumes/plex/rips/series/Westworld/Westworld\ -\ 2x03\ -\ Virtù\ e\ Fortuna.mkv     
Scanning media...
Command line:
HandBrakeCLI --input /Volumes/plex/rips/series/Westworld/Westworld\ -\ 2x03\ -\ Virtu\̀\ e\ Fortuna.mkv --output Westworld\ -\ 2x03\ -\ Virtu\̀\ e\ Fortuna.mkv --encoder x264 --vb 5000 --two-pass --turbo --rate 60 --crop-mode conservative --audio 1 --aencoder av_aac --ab 384 --mixdown 5point1 --encopts vbv-maxrate\=15000:vbv-bufsize\=15000
Transcoding...
[17:32:05] Compile-time hardening features are enabled
[17:32:05] hb_init: starting libhb thread
[17:32:05] thread 16d38b000 started ("libhb")
HandBrake 20230307062323-dbeaa4698-master (2023030901) - Darwin arm64 - https://handbrake.fr
20 CPUs detected
Opening /Volumes/plex/rips/series/Westworld/Westworld - 2x03 - Virtù e Fortuna.mkv...
[17:32:05] CPU: Unknown
[17:32:05]  - logical processor count: 20
[17:32:05] hb_scan: path=/Volumes/plex/rips/series/Westworld/Westworld - 2x03 - Virtù e Fortuna.mkv, title_index=1

I am, however using a complied version of Handbrake.

❯ HandBrakeCLI --version 
Compile-time hardening features are enabled
[17:32:28] hb_init: starting libhb thread
[17:32:28] thread 16d793000 started ("libhb")
HandBrake 20230307062323-dbeaa4698-master

So it might be a Handbrake version issue?

lisamelton commented 1 year ago

@weaverm My thanks to @ttyS0 for the assist, as usual! 👍

Yeah, the "ù" Virtù and the "É" and "é" in "Écorchés" are the causing the problems here. I don't think the issue is with HandBrakeCLI or ffprobe since both transcode-video and other-transcode handle the inputs fine. I'm not doing anything different with the inputs in two-pass-transcode.rb than I do in those other scripts so it's likely to be how they're invoked. Both transcode-video and other-transcode are installed via RubyGems so it's likely they are invoked differently. And may, in fact, be using different versions of Ruby depending upon how your system is configured.

Honestly, I have no idea who to even proceed with testing that theory.

BTW, I renamed both of those files years ago because I found that accessing them from SMB-volumes caused a similar problem. So my rule now is never use non-Latin-1 characters in filenames. 🤷‍♀️

weaverm commented 1 year ago

I have HandBrakeCLI as downloaded from HandBrake's website.

weaverm@iMac Super Scratch % HandBrakeCLI --version
[21:53:34] Compile-time hardening features are enabled
[21:53:34] hb_init: starting libhb thread
[21:53:34] thread 70000a037000 started ("libhb")
HandBrake 1.6.1

HandBrake has exited.

When I opened this bug, I had replicated it many times. It even persists across a system reboot.

I invoke my transcodes from a bash script. Nothing fancy, it just lets me keep a history of the exact parameters I used for a transcode since old ones are just commented out. I copy & pasted the offending lines from my bash script that I'm using for two-pass-transcode to a test script that is messy with lots of past experiments commented out.

And it worked as I would expect and without error.

I turned on show invisibles in BBEdit, figuring I had a weird control character in my 'production' script. But I didn't. It looked exactly the same. I copy & pasted back to my 'production' script from my test script and it failed again. Both (all) my scripts are the same bash environment.

#!/usr/bin/env bash

I made a new file and copy & pasted my entire 'production' script into it and it works fine.

I'm at a loss to explain the behavior. But life is short. I'm happy to move on if you are.

lisamelton commented 1 year ago

@weaverm Indeed, let us move on. Feel free to close this now. I am so sorry we weren't able to help you figure this out.