results_queue_size defines max number of prefetched row groups?

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.8k stars 284 forks source link

results_queue_size defines max number of prefetched row groups? #456

Closed GregAru closed 4 years ago

GregAru commented 4 years ago

According to make_reader doc: results_queue_size: Size of the results queue to store prefetched rows

However when debugging I can see that the _results_queue in ThreadPool which is affected by this parameter holds full row groups and not individual rows.

Am I missing anything?

selitvin commented 4 years ago

That's an error in documentation. It changed awhile ago and the documentation was not updated. I'll update it now. Thanks for pointing it out.