shane-settle / neural-acoustic-word-embeddings

45 stars 8 forks source link

hot fix for extract-rows removal from new version kaldi #2

Open ha3an opened 5 years ago

ha3an commented 5 years ago

Hi, if you run the kaldi/run.sh script with newer kaldi version. you will get an error because extract-rows is removed from kaldi. A workaround is to use the following code instead: extract-feature-segments --snip-edges=false "scp:$scp_file" $queries/$partition/intervals "ark,scp:$queries/$partition/mfcc.ark,$queries/$partition/mfcc.scp"

but "extract-feature-segments" takes start and end segments in seconds while "extract-rows" takes start and end index of the matrix.

you can do this by modifying get_intervals.py code. i didn't spend much time for changing, but here is the fast solution starting from line 62:

  if utt is not None:
            query_id = "%s_%s_%s" % (query, convside, query_interval)
            interval_in_sec = list(map(lambda x, y: str(float(x) / 100 - float(y)/100), query_interval.split('-'), [utt.split('-')[0]]*2))
            utt_id = "%s_%s" % (convside, utt)
            query_intervals.append("%s %s %s" % (query_id, utt_id  , ' '.join(interval_in_sec)))
            #query_intervals.append("%s %s %s" % (query_id, utt_id, utt_query_range))

    return sorted(query_intervals)

I hope it helps

mezhou commented 5 years ago

Hi, you can add the extract-rows.cc file from the old version kaldi to your new kaldi, then rebuild it and the problem is solved.