srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
824 stars 342 forks source link

Save every 100 batches. #186

Closed siddalmia closed 6 years ago

fmetze commented 6 years ago

Would it be possible to open the ark/ scp files in "append mode" (not sure if it is). I think it is not ideal to have potentially many small files laying around when we don't parallelize the output, but we only save on memory.

ramonsanabria commented 6 years ago

No, the solution for that was using hdf5. The problem for that is that we can not keep the .ark in memory.

2018-06-18 23:28 GMT-04:00 Florian Metze notifications@github.com:

Would it be possible to open the ark/ scp files in "append mode" (not sure if it is). I think it is not ideal to have potentially many small files laying around when we don't parallelize the output, but we only save on memory.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/pull/186#issuecomment-398263094, or mute the thread https://github.com/notifications/unsubscribe-auth/AMlwPeEvN-eL_-mGt95qh6mOGd7HErJZks5t-G_7gaJpZM4UlhSi .

siddalmia commented 6 years ago

https://github.com/siddalmia/eesen/blob/bb96c589bb626f8babe4a5109f17a970037b6bc9/tf/ctc-am/utils/fileutils/kaldi.py#L166

Hmm, The writeark seems to be working in append mode.. It might actually be possible. Let me check

siddalmia commented 6 years ago

ok, @fmetze could you check this, along with https://github.com/srvk/eesen/pull/187 ? It should do the append.

ramonsanabria commented 6 years ago

Not sure why you added this (100 batches save mode). Can you explain a bit please ?

Also, if this have to be in the repo. Wouldn't be better as an option?

Thanks

2018-06-18 23:49 GMT-04:00 Siddharth Dalmia notifications@github.com:

ok, @fmetze https://github.com/fmetze could you check this, along with

187 https://github.com/srvk/eesen/pull/187 ? It should do the append.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/pull/186#issuecomment-398265853, or mute the thread https://github.com/notifications/unsubscribe-auth/AMlwPbunHElYFprXr2FdYTv3aH2VzKRRks5t-HTBgaJpZM4UlhSi .

fmetze commented 6 years ago

I don't have a test setup ready. I think it would be best to make it an option, and if that has been enabled, write out the file in multiple batches, appending to the big file.