mt-upc / SHAS

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
MIT License
36 stars 4 forks source link

incorrect implementation of pdac algorithm? #4

Open hadyelsahar opened 1 year ago

hadyelsahar commented 1 year ago

Hello thanks for sharing your code!

I wanted to clarify the correctness of pdac recursive implementation as currently i receive segments that are > max_segment_length.

Mostly I think the issues are in lines: #L121-L123 and #L134-L135 in the implementation that should be deleted.

    def recusrive_split(sgm):
        if sgm.duration < max_segment_length:
            segments.append(sgm)
        else:
            j = 0
            sorted_indices = np.argsort(sgm.probs)
            while j < len(sorted_indices):
                split_idx = sorted_indices[j]
                split_prob = sgm.probs[split_idx]
### this line could allow for sgm.duration > max_seglength
>                if split_prob > threshold:  
>                    segments.append(sgm)
>                    break

                sgm_a, sgm_b = split_and_trim(sgm, split_idx, threshold)
                if (
                    sgm_a.duration > min_segment_length
                    and sgm_b.duration > min_segment_length
                ):
                    recusrive_split(sgm_a)
                    recusrive_split(sgm_b)
                    break
                j += 1
### this line could allow for sgm.duration > max_seglength
>            else:
>                segments.append(sgm)

Could you please explain the need for those two clauses? , from empirical experiment and by matching with the algorithm in the paper they are not needed.

image

johntsi commented 1 year ago

Hi @hadyelsahar,

thanks for bringing this up! It is fixed in 418b5e6. The argument --not_strict can now be used in segment.py to allow for segments longer than max, while the default remains as described in the paper. Note that all experiments and results in the paper are with all segments forced to be shorter than max.

These differences from the implementation described in the paper slipped into a code update while I was doing some follow-up experiments where the classification threshold conditions (p > thr) were more important than the segment length ones (len < max).

Empirically, the argument --not_strict would not have any significant impact when the suggested parameters (max=18, min=0.2, thr=0.5) are used, since the requirements are easily satisfied given the range of min-max. On the other side, it will matter for smaller values of max, where the conditions are harder to be satisfied. I have seen that using --not_strict is better in these cases (in terms of translation quality).

Let me know if you have any more questions.

hadyelsahar commented 1 year ago

Yes alright this makes sense thanks a lot!

While i tend to agree however, letting segments go to more than max_seg_length in the wild (not on the test sets you have here) yielded segments with > 100 secs (when max = 10secs) this caused some memory issues that cannot be anticipated.

johntsi commented 1 year ago

Ah yes I see, if it's applied in different domains you can also try to mitigate this by adjusting the thr parameter.