Open hadyelsahar opened 1 year ago
Hi @hadyelsahar,
thanks for bringing this up! It is fixed in 418b5e6. The argument --not_strict
can now be used in segment.py to allow for segments longer than max, while the default remains as described in the paper. Note that all experiments and results in the paper are with all segments forced to be shorter than max.
These differences from the implementation described in the paper slipped into a code update while I was doing some follow-up experiments where the classification threshold conditions (p > thr) were more important than the segment length ones (len < max).
Empirically, the argument --not_strict
would not have any significant impact when the suggested parameters (max=18, min=0.2, thr=0.5) are used, since the requirements are easily satisfied given the range of min-max. On the other side, it will matter for smaller values of max, where the conditions are harder to be satisfied. I have seen that using --not_strict
is better in these cases (in terms of translation quality).
Let me know if you have any more questions.
Yes alright this makes sense thanks a lot!
While i tend to agree however, letting segments go to more than max_seg_length in the wild (not on the test sets you have here) yielded segments with > 100 secs (when max = 10secs) this caused some memory issues that cannot be anticipated.
Ah yes I see, if it's applied in different domains you can also try to mitigate this by adjusting the thr parameter.
Hello thanks for sharing your code!
I wanted to clarify the correctness of pdac recursive implementation as currently i receive segments that are > max_segment_length.
Mostly I think the issues are in lines: #L121-L123 and #L134-L135 in the implementation that should be deleted.
Could you please explain the need for those two clauses? , from empirical experiment and by matching with the algorithm in the paper they are not needed.