stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
8.75k stars 777 forks source link

[Bug Report] Charencoding issue when scanning a zip gallery #3566

Closed DampToast closed 1 year ago

DampToast commented 1 year ago

Describe the bug I am getting the following error:

Error scanning zip file "/content/MetArt/2024/test.zip": failed to lookup charset IBM424_ltr, language he

I narrowed down that if the jpg name has '-Ti' in the name then it would throw this error.

Out of the 13 zip files listed below, 5 of them have '-Ti' on the jpg file names. I haven't figured out the other combinations for the other ones, but I imagine it's the same issue.

DogmaDragon hinted it might have something to do with https://github.com/stashapp/stash/pull/3389 when I inquired in the help channel on Discord.

MetArt_2007-07-16_AMATEURISH-EVI-B-by-JAN-VELS_9d944_high.zip MetArt_2011-05-12_TIMELESS-NASTYA-J-by-NATASHA-SCHON_74db3_high.zip MetArt_2012-07-14_ORGANZA-NAOMI-E-by-ANGELA-LININ_42d4f_high.zip MetArt_2014-04-18_ROVINTH-EMILY-BLOOM-by-GONCHAROV_6162f_high.zip MetArt_2016-04-30_TIRZA-KALEESY-by-FABRICE_b3155_high.zip MetArt_2016-05-12_TIPICA-LUCY-LI-by-DELTAGAMMA_fc7fc_high.zip MetArt_2019-04-15_TIMELESS-OPHELIA-by-MATISS_510cf_high.zip MetArt_2020-09-29_MY-FANTASIES-TINA-TINY-by-ALEX-LYNN_2fef0_high.zip MetArt_2020-10-29_OUTDOOR-ROMP-TINA-TINY-by-ALEX-LYNN_6e58e_high.zip MetArt_2020-12-03_LOLLIPOP-TINA-TINY-by-ALEX-LYNN_61255_high.zip MetArt_2021-05-02_GIRL-TIME-KERTU-by-KOENART_7cc2b_high.zip MetArt_2022-08-14_MY-TIME-AWAY-BABY-NICOLS-by-ERRO_69726_high.zip

To Reproduce Steps to reproduce the behavior:

  1. Create a single image (you can have more, but one is fine for testing) and name that image '-Ti.jpg'.
  2. Zip it to use as an image gallery, the name of this file doesn't seem to matter.
  3. Try to scan to import it into Stash. You should get the above error.

Stash Version: (from Settings -> About): v0.19.1-73-g88b3b87f

Additional context If you want to use a longer file name, an example would be 'MetArt_My-Time-Away_Baby-Nicols_high_0134.jpg'

deezero commented 11 months ago

Sorry for the necro post.

@WithoutPants considering the ngrams that match IBM424_ltr (and IBM424_rtl), is it worth matching on those specific charsets and making it a debug level message ?

The matching list is quite long and there's plenty of other matches, some of them only get saved by higher confidence matches (so Blah-AF.jpg thankfully isn't matched).