uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Tests: fix the unit test of batched dataloader with in-memory cache #643

Closed chongxiaoc closed 3 years ago

chongxiaoc commented 3 years ago

Also add comments of setting shuffling_queue_capacity correctly when enablding in memory cache.

Fix #642

chongxiaoc commented 3 years ago

Can you please clarify - this is a PR into master. Is master currently broken?

@selitvin This is a missed testing case in master. And if enable it, it will fail.

This PR just adds the missed tests and fixes it.

tgaddair commented 3 years ago

@selitvin, should we ignore the failing Docker tests?

selitvin commented 3 years ago

Yep. Ok to ignore docker.

chongxiaoc commented 3 years ago

@selitvin Can we merge this one?