rapidsai / kvikio

KvikIO - High Performance File IO
https://docs.rapids.ai/api/kvikio/stable/
Apache License 2.0
148 stars 53 forks source link

Limit number of open files #409

Open madsbk opened 1 month ago

madsbk commented 1 month ago

In order to avoid ulimit issues, it would be useful to have an option that limits the number of open files. Maybe open files lazily?

cc. @VibhuJawa

VibhuJawa commented 1 month ago

For context, I have seen this error the most while writing partitioned datasets. Don't know how it impacts there.

dask_df.to_parquetpartition_on=["xyz"])

And if https://github.com/rapidsai/kvikio/pull/410 helps in that case too.

VibhuJawa commented 1 month ago

Should help with issues mentioned in the comments here:

https://github.com/NVIDIA/NeMo-Curator/pull/157#discussion_r1687215245