nbest C++ api use too much vram

k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

https://k2-fsa.github.io/k2

Apache License 2.0

1.08k stars 211 forks source link

nbest C++ api use too much vram #1281

Open binhtranmcs opened 2 months ago

binhtranmcs commented 2 months ago

I tried hlg decoding using both hlg_decode.cu and hlg_decode.py, modified them a bit to get nbest from the lattice. The python api seems fine but i got OOM when using c++ api. As i know, python api is still c++ under the hood. So I wonder if you all face the same issue as mine, that the c++ api use a lot more memory than python api, or I installed k2 incorrectly in some ways. Please help me with this.

Model: librispeech conformer ctc Code: python: hlg_decode.py.txt c++: hlg_decode.cu.txt Audio (change ext to .wav): testvram.txt

Thanks in advance!

binhtranmcs commented 2 months ago

Any updates on this?

binhtranmcs commented 2 months ago

@pkufool @csukuangfj do have any insight into this

danpovey commented 2 months ago

sorry for delay, if you could.find.a way to.display your diff conveniently it may be easier for.us.to look...

On Thursday, May 2, 2024, binhtranmcs @.***> wrote:

@pkufool https://github.com/pkufool @csukuangfj https://github.com/csukuangfj do have any insight into this

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/issues/1281#issuecomment-2089814622, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3HIGRUI2PYIOBJ223ZAHUYHAVCNFSM6AAAAABGSPQDLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZHAYTINRSGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

pkufool commented 2 months ago

@binhtranmcs Have you figured out the reasons? I think it is because python breaks a large batch into smaller sub batches when doing intersect, see _intersect_device in the python code.

binhtranmcs commented 2 months ago

@binhtranmcs Have you figured out the reasons? I think it is because python breaks a large batch into smaller sub batches when doing intersect, see _intersect_device in the python code.

tks @pkufool, I will have a look at this