meeb / tubesync

Syncs YouTube channels and playlists to a locally hosted media server
GNU Affero General Public License v3.0
1.86k stars 119 forks source link

[bug] can't download files which filename will exceed 255 characters #522

Open FaySmash opened 1 month ago

FaySmash commented 1 month ago

From my added sources, there are 14 tasks stuck, because they fail to download: ERROR: unable to open for writing: [Errno 36] File name too long: '/downloads/video/Kochou Cocoa/2021-07-18 - 【3Dio#ASMR 】同接150超えるまでやめれません!耳はむ多めで甘々でデレデレになっちゃう最高の癒し♡たくさん甘えて?♡耳フー♡ please Request【#胡蝶ここあ Vtuber】 (DdMVc7-Ks8A).f299.mp4.part' The original video has a character length of 98 characters but because they are unicode, this translates to 225 ASCII characters. After tubesync adds the pre- and suffix as well as the part file extensions it easily exceeds 255 characters.

There needs to be a check of the the maximum resulting character limit (when there are more parts to fetch the resulting filename gradually grows) and the Video title needs to be trimmed.

As a workaround I managed to download the videos separately and added them as external videos.

meeb commented 1 month ago

What file system are you using for your /downloads mount?

FaySmash commented 1 month ago

What file system are you using for your /downloads mount?

A SMB docker mount, from a TrueNAS Share, so in short ZFS which has a hard limit of 255 chars.

Maybe the best solution would be to first download the video with only its video ID as filename and rename it after it's completed to a max 255 chars filename.

meeb commented 1 month ago

I'll look at implementing a truncation for video titles. This is a relatively rare event because YouTube limit titles to 100 characters so to be over 255 bytes once encoded from UTF8 into bytes would be pretty unusual. Just your video title alone expands from 55 characters to 165 bytes.

Truncating the variable used to generate the title component of the filename formatter to be within 255 chars (including the rest of the filename format template) seems the most logical option.

FaySmash commented 1 month ago

I'll look at implementing a truncation for video titles. This is a relatively rare event because YouTube limit titles to 100 characters so to be over 255 bytes once encoded from UTF8 into bytes would be pretty unusual. Just your video title alone expands from 55 characters to 165 bytes.

I ran into this issue 7 times 😅 (but I also downloaded 2000+ videos). All videos were from Japanese channels, hence the Unicode issue.

Truncating the variable used to generate the title component of the filename formatter to be within 255 chars (including the rest of the filename format template) seems the most logical option.

Sounds good, but this could still cause an issue with very long videos exceeding 1000 parts, because the filename can grow during downloading

meeb commented 1 month ago

Sounds good, but this could still cause an issue with very long videos exceeding 1000 parts, because the filename can grow during downloading

Yeah I was going to pad on sufficient space for this, an extra 12 chars spare should do it to cover the part number and extension with dots.

FaySmash commented 1 month ago

Yeah I was going to pad on sufficient space for this, an extra 12 chars spare should do it to cover the part number and extension with dots.

While possible, I'd prefer a more versatile approach. Some filesystem allow for longer filenames and setting a fixed length for padding could remove more characters than necessary. Maybe expose the value of the maximum filename character to an environment variable so that it can be overridden? And renaming the file after it was downloaded would prevent the removal of more characters than necessary.