Subsync erroring out with "max() arg is an empty sequence".

PaulinoRBJ commented 4 years ago

Environment (please complete the following information):

OS: Ubuntu 18.04 LTS
python version: Python 2.7.17
subsync version: subsync 0.3.4

Describe the bug The sync process fails with an error ValueError: max() arg is an empty sequence

To Reproduce subsync "vid.mp4" -i sub1.srt -o sub2.srt

Expected behavior The process should succeed.

Output

INFO:subsync.subsync:extracting speech segments from reference 'vid.mp4'...
INFO:subsync.speech_transformers:Checking video for subtitles stream...
INFO:subsync.speech_transformers:Video file appears to lack subtitle stream
100%|█████████▉| 7687.72266667/7687.723 [01:33<00:00, 81.91it/s]
INFO:subsync.subsync:...done
INFO:subsync.subsync:extracting speech segments from subtitles 'sub1.srt'...
INFO:subsync.subtitle_parser:detected encoding: UTF-8
INFO:subsync.subsync:...done
INFO:subsync.subsync:computing alignments...
Traceback (most recent call last):
  File "/home/user/.local/bin/subsync", line 11, in <module>
    load_entry_point('subsync==0.3.4', 'console_scripts', 'subsync')()
  File "/home/user/.local/lib/python2.7/site-packages/subsync/subsync.py", line 207, in main
    return run(args)
  File "/home/user/.local/lib/python2.7/site-packages/subsync/subsync.py", line 106, in run
    srt_pipes,
  File "/home/user/.local/lib/python2.7/site-packages/sklearn/base.py", line 467, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "/home/user/.local/lib/python2.7/site-packages/subsync/aligners.py", line 77, in transform
    (score, offset), subpipe = max(scores)
ValueError: max() arg is an empty sequence

Test case vid.zip

Additional context --

smacke commented 4 years ago

Thanks for submitting an issue! Could you repeat the full command with the --make-test-case flag? subsync "vid.mp4" -i sub1.srt -o sub2.srt --make-test-case It will generate a tarball that also includes the subtitles, which are also needed to debug.

Glandos commented 4 years ago

I've had the same issue with SRT-only realignement. The test case is in the attached file.

Happy.Hour.2015.ENG.zip

smacke commented 4 years ago

Ugh, I just realized that this exception is probably interfering with test case generation, which is maybe why the zips you all are uploading don't include them. So that's another bug I should fix first...

smacke commented 4 years ago

Let me fix #61 and ask for new test cases once I push a fix...

PaulinoRBJ commented 4 years ago

@smacke you're right. The test cases were only producing the .npy files.

Torstein-Eide commented 4 years ago

testfile-max.zip Error message:

INFO:ffsubsync.subtitle_parser:detected encoding: WINDOWS-1252
INFO:ffsubsync.subsync:...done
INFO:ffsubsync.subsync:computing alignments...
Traceback (most recent call last):
  File "[home]/.local/bin/ffsubsync", line 11, in <module>
    load_entry_point('ffsubsync==0.3.7', 'console_scripts', 'ffsubsync')()
  File "[home]/.local/lib/python3.6/site-packages/ffsubsync/subsync.py", line 208, in main
    return run(args)
  File "[home]/.local/lib/python3.6/site-packages/ffsubsync/subsync.py", line 106, in run
    srt_pipes,
  File "[home]/.local/lib/python3.6/site-packages/sklearn/base.py", line 693, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "[home]/.local/lib/python3.6/site-packages/ffsubsync/aligners.py", line 77, in transform
    (score, offset), subpipe = max(scores, key=lambda x: x[0][0])
ValueError: max() arg is an empty sequence

interlark commented 4 years ago

Kung Fury (2015).zip

INFO:ffsubsync.subsync:extracting speech segments from reference 'Kung.Fury.mp4'...
INFO:ffsubsync.speech_transformers:Checking video for subtitles stream...
INFO:ffsubsync.speech_transformers:Video file appears to lack subtitle stream
100%|██████████████████████████████| 1862.368/1862.368 [00:03<00:00, 504.19it/s]
INFO:ffsubsync.subsync:...done
INFO:ffsubsync.subsync:serializing speech...
INFO:ffsubsync.subsync:...done
INFO:ffsubsync.subsync:extracting speech segments from subtitles '/opt/Kung Fury (2015).en.srt'...
INFO:ffsubsync.subtitle_parser:detected encoding: UTF-8
INFO:ffsubsync.subsync:...done
INFO:ffsubsync.subsync:computing alignments...
Traceback (most recent call last):
  File "/home/user/.local/bin/ffsubsync", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.8/site-packages/ffsubsync/subsync.py", line 208, in main
    return run(args)
  File "/home/user/.local/lib/python3.8/site-packages/ffsubsync/subsync.py", line 102, in run
    offset_samples, best_srt_pipe = MaxScoreAligner(
  File "/home/user/.local/lib/python3.8/site-packages/sklearn/base.py", line 693, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "/home/user/.local/lib/python3.8/site-packages/ffsubsync/aligners.py", line 77, in transform
    (score, offset), subpipe = max(scores, key=lambda x: x[0][0])
ValueError: max() arg is an empty sequence

smacke commented 4 years ago

Hi everyone, the underlying cause of this issue is when ffsubsync can't find a good sync (it tries a number of alternatives, but if it doesn't consider any of them "good", then there are 0 to pick from, hence the empty sequence). Version 0.4.0 (now on PyPI) gives a more informative error message that suggests a possible workaround, but unfortunately it's far from guaranteed to work.

For the record, this error will now manifest itself via the following output: ERROR:ffsubsync.ffsubsync:Synchronization failed; consider passing --max-offset-seconds with a number larger than 600

The suggestion may or may not work (likely will not; it's not common to have shifts larger than 10 minutes, though it is possible).

In my experience these are cases that are very difficult for the algorithm, either because they involve breaks / splits, or because the speech detection gives results that are too noisy. Please keep the test cases coming though! They will help me as I make improvements.

smacke / ffsubsync

Subsync erroring out with "max() arg is an empty sequence". #60