k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

WIP: add compute-post. #210

Open csukuangfj opened 3 years ago

csukuangfj commented 3 years ago

Usage:

$ snowfall net compute-post -m /ceph-fj/model-jit.pt -f exp/data/cuts_test-clean.json.gz -o exp

I find that there is one issue with the Torch Scripted module: We have to know the signature of the forward function of the model as well as its subsampling factor.


Working on compute-ali and will submit them together.

csukuangfj commented 3 years ago

I just created a pull-request in Lhotse https://github.com/lhotse-speech/lhotse/pull/319 to add posteriors to the class Cut. The motivation is to reuse the serialization and dataset code from it.


Also, I find the alignment information contained in the supervision is too simple, see https://github.com/lhotse-speech/lhotse/blob/ef7a037426f1b602a54f4d9ea43e711007e85719/lhotse/supervision.py#L24

    symbol: str    
    start: Seconds    
    duration: Seconds

Can we move the alignment class from snowfall to lhotse? https://github.com/k2-fsa/snowfall/blob/bce73304f40c321a6dad809058b12e559962c321/snowfall/tools/ali.py#L20-L28

csukuangfj commented 3 years ago

The usage of compute-ali:

$ snowfall  ali compute-ali -l data/lang_nosp -p ./exp/cuts_post.json  --max-duration=500 -o exp
pzelasko commented 3 years ago

Also, I find the alignment information contained in the supervision is too simple

Can you describe the issue more? I'm not sure I understand what's missing there. We could move Snowfall's frame-wise alignment to Lhotse but I'm not sure how to make the two representations compatible with each other (the CTM-like description seems more general to me as you can cast it to frame-wise representation with different frame shifts).

pzelasko commented 3 years ago

BTW I wonder if we should support piping these programs together, Kaldi-style. Click easily allows doing that with file type arguments.

We could do that by writing/reading JSONL-serialized manifests in a streaming manner. Since most operations on CutSet refer to individual operations on Cut, this seems feasible without the need to re-write too much code. There is a function in Lhotse that tries to figure out the right manifest type from a dict, which can be used to parse individual lines (BTW @csukuangfj I just realized that you might need to extend that function to handle the posterior manifests in your Lhotse PR).

WDYT?

pzelasko commented 3 years ago

... there is also some code for line-by-line incremental JSONL writing in Lhotse that could be extended to support this.

danpovey commented 3 years ago

This cool; I'm afraid I'm not following it in detail. Just a reminder; this is more an "experimental direction" at this point. We'll have to learn from experience whether these kinds of command line utilites are actually a useful thing.

pzelasko commented 3 years ago

Fair enough. The idea is to allow sth like:

snowfall net compute-post <some-inputs-args..> - | snowfall net compute-ali - <some-more-args..>

but I just realized that with the current way things are done in Lhotse, we would have store the actual arrays/tensors on disk and just pass the manifests around, which might not be optimal. Maybe it's not relevant for now and we can see how to do that in the future, if needed at all.

danpovey commented 3 years ago

BTW, I tend to think being able to do something at all tends to be more important than that thing being efficient-- premature optimization being the root of all evil etc., although I did plenty of it in Kaldi. I don't know what the optimal solution is here, I am afraid I have not been following this PR closely enough.

pzelasko commented 3 years ago

Agreed. But for the record, the full quote is actually:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."