Open csukuangfj opened 3 years ago
I just created a pull-request in Lhotse https://github.com/lhotse-speech/lhotse/pull/319 to add
posteriors to the class Cut
. The motivation is to reuse the serialization and dataset code from it.
Also, I find the alignment information contained in the supervision is too simple, see https://github.com/lhotse-speech/lhotse/blob/ef7a037426f1b602a54f4d9ea43e711007e85719/lhotse/supervision.py#L24
symbol: str
start: Seconds
duration: Seconds
Can we move the alignment class from snowfall to lhotse? https://github.com/k2-fsa/snowfall/blob/bce73304f40c321a6dad809058b12e559962c321/snowfall/tools/ali.py#L20-L28
The usage of compute-ali
:
$ snowfall ali compute-ali -l data/lang_nosp -p ./exp/cuts_post.json --max-duration=500 -o exp
Also, I find the alignment information contained in the supervision is too simple
Can you describe the issue more? I'm not sure I understand what's missing there. We could move Snowfall's frame-wise alignment to Lhotse but I'm not sure how to make the two representations compatible with each other (the CTM-like description seems more general to me as you can cast it to frame-wise representation with different frame shifts).
BTW I wonder if we should support piping these programs together, Kaldi-style. Click easily allows doing that with file type arguments.
We could do that by writing/reading JSONL-serialized manifests in a streaming manner. Since most operations on CutSet
refer to individual operations on Cut
, this seems feasible without the need to re-write too much code. There is a function in Lhotse that tries to figure out the right manifest type from a dict, which can be used to parse individual lines (BTW @csukuangfj I just realized that you might need to extend that function to handle the posterior manifests in your Lhotse PR).
WDYT?
... there is also some code for line-by-line incremental JSONL writing in Lhotse that could be extended to support this.
This cool; I'm afraid I'm not following it in detail. Just a reminder; this is more an "experimental direction" at this point. We'll have to learn from experience whether these kinds of command line utilites are actually a useful thing.
Fair enough. The idea is to allow sth like:
snowfall net compute-post <some-inputs-args..> - | snowfall net compute-ali - <some-more-args..>
but I just realized that with the current way things are done in Lhotse, we would have store the actual arrays/tensors on disk and just pass the manifests around, which might not be optimal. Maybe it's not relevant for now and we can see how to do that in the future, if needed at all.
BTW, I tend to think being able to do something at all tends to be more important than that thing being efficient-- premature optimization being the root of all evil etc., although I did plenty of it in Kaldi. I don't know what the optimal solution is here, I am afraid I have not been following this PR closely enough.
Agreed. But for the record, the full quote is actually:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
Usage:
I find that there is one issue with the Torch Scripted module: We have to know the signature of the
forward
function of the model as well as its subsampling factor.Working on
compute-ali
and will submit them together.