mepeichun / SubtitleTranslate

Translate the subtitle file into other language
GNU General Public License v3.0
106 stars 37 forks source link

Error in util_srt.py - IndexError: list index out of range #3

Open pendave opened 3 years ago

pendave commented 3 years ago

你好,怎么 fix 这个啊?

subtitle content:

295
00:08:52,100 --> 00:08:53,266
Our cell in there and,

296
00:08:53,266 --> 00:08:54,666
of course the subdivision, is

297
00:08:54,666 --> 00:08:55,366
set to one

298
00:08:56,166 --> 00:08:58,400
we just back to zero and crank

299
00:08:58,600 --> 00:09:00,000
up the thickness a little bit

300
00:09:01,633 --> 00:09:02,400
and there

301
00:09:02,733 --> 00:09:05,133
we go. We have our. We

302
00:09:05,733 --> 00:09:07,600
have our cell right here,

303
00:09:09,366 --> 00:09:10,900
very good if

304
00:09:10,900 --> 00:09:12,466
we name this our

305
00:09:12,566 --> 00:09:13,033
set.
Traceback (most recent call last):
  File "sample.py", line 12, in <module>
    translate_and_compose(input_file, 'sample_cn_only.srt', 'en', 'zh-CN', both=False)
  File "E:\MyPython\SubtitleTranslate-master\utils.py", line 111, in translate_and_compose
    translated_list = translate_srt(subtitle, src_lang, target_lang, space=space)
  File "E:\MyPython\SubtitleTranslate-master\utils.py", line 69, in translate_srt
    dialog_list = sen_list2dialog_list(translated_sen_list, mass_list, space, cn=True)
  File "E:\MyPython\SubtitleTranslate-master\util_srt.py", line 164, in sen_list2dialog_list
    origin_len = record[-1][1]
IndexError: list index out of range

还有.cn 的 Google 免费翻译引擎失效了,得换成 .com 才能用

pendave commented 3 years ago

The bug may be here

这一段所在的范围 好奇怪 7 是指原英文字幕里的第7个

C1WN5DCSK 2)O{S7EPJJTV0

发现这脚本 util_srt.py 有 bug

def compute_mass_list(dialog_idx, sen_idx):
    i = 0
    j = 1
    mass_list = []
    one_sentence = []
    while i < len(dialog_idx):
        if dialog_idx[i] > sen_idx[j]:
            mass_list.append(one_sentence)
            one_sentence = []
            j += 1
        else:
            one_sentence.append((i + 1, dialog_idx[i] - sen_idx[j - 1]))
            i += 1
    mass_list.append(one_sentence)
    return mass_list 

比如我带入两个参数

dialog_idx = [23, 53, 64, 95, 125, 135, 158, 184, 197, 214, 219]
sen_idx = [0, 142, 155, 219]

生成的第二句居然是[] [[(1, 23), (2, 53), (3, 64), (4, 95), (5, 125), (6, 135)], [], [(7, 3), (8, 29), (9, 42), (10, 59), (11, 64)]]

1
00:08:52,100 --> 00:08:53,266
Our cell in there and,

2
00:08:53,266 --> 00:08:54,666
of course the subdivision, is

3
00:08:54,666 --> 00:08:55,366
set to one

4
00:08:56,166 --> 00:08:58,400
we just back to zero and crank

5
00:08:58,600 --> 00:09:00,000
up the thickness a little bit

6
00:09:01,633 --> 00:09:02,400
and there

7
00:09:02,733 --> 00:09:05,133
we go. We have our. We

8
00:09:05,733 --> 00:09:07,600
have our cell right here,

9
00:09:09,366 --> 00:09:10,900
very good if

10
00:09:10,900 --> 00:09:12,466
we name this our

11
00:09:12,566 --> 00:09:13,033
set.

6H@8B5RYAN~CG~8`YXYNA Y

整个段落的纯文本:
Our cell in there and, of course the subdivision, is set to one we just back to zero and crank up the thickness a little bit and there we go. We have our. We have our cell right here, very good if we name this our set.
原字幕每条末尾的断点:
[23, 53, 64, 95, 125, 135, 158, 184, 197, 214, 219]
将段落的纯文本分割成句子 列表:
['Our cell in there and, of course the subdivision, is set to one we just back to zero and crank up the thickness a little bit and there we go.', 'We have our.', 'We have our cell right here, very good if we name this our set.']
计算每句的断点 并包括了起始的0:
[0, 142, 155, 219]
Yoshi8765 commented 1 year ago

I also have this issue.