wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
3.87k stars 1.04k forks source link

[Dataset] Resolving the 'Illegal seek' error #2509

Open srdfjy opened 2 months ago

srdfjy commented 2 months ago

When handling network resources, it is necessary to open tar files using the mode 'r|' instead of 'r:', as the former does not attempt seek operations and is suitable for sequential reading.

xingchensong commented 2 months ago

when will illegal seek happen?

xingchensong commented 2 months ago

seems there has a conflict https://github.com/wenet-e2e/wenet/pull/2301#issuecomment-1893027908

srdfjy commented 2 months ago

When using shard mode, an "illegal seek" error can occur.

xingchensong commented 1 month ago

When using shard mode, an "illegal seek" error can occur.

I use shard for all my exps, and i never meet this issue

srdfjy commented 1 month ago

When using shard mode, an "illegal seek" error can occur.

I use shard for all my exps, and i never meet this issue

This error occurs when using HTTP+Shard during the reading of a tar file (tar_file_and_group -> tarfile.open).