s-udhaya commented 1 year ago

The default results_queue_size used in make_batch_reader is 50. This will lead to OOM exception while loading big datasets with thread reader pool type. By exposing results_queue_size parameter via make_batch_reader api, the user is able to control the prefetched data size which in turns will help to alleviate OOM issue.

CLAassistant commented 1 year ago

All committers have signed the CLA.

s-udhaya commented 1 year ago

@selitvin I am in the process of making end to end DDL demos with big datasets, petastorm and horovod + DDP strategies. Many thanks for the great library. I encountered OOM issue during the process and this fix will help me to alleviate that issue. Could you please have a look at it whenever you have time?

codecov[bot] commented 1 year ago

Codecov Report

Base: 86.25% // Head: 86.25% // No change to project coverage :thumbsup:

Coverage data is based on head (7340ed6) compared to base (170b22a). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #783 +/- ## ======================================= Coverage 86.25% 86.25% ======================================= Files 84 84 Lines 5078 5078 Branches 785 785 ======================================= Hits 4380 4380 Misses 559 559 Partials 139 139 ``` | [Impacted Files](https://codecov.io/gh/uber/petastorm/pull/783?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uber) | Coverage Δ | | |---|---|---| | [petastorm/reader.py](https://codecov.io/gh/uber/petastorm/pull/783/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uber#diff-cGV0YXN0b3JtL3JlYWRlci5weQ==) | `90.82% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uber). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=uber)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

s-udhaya commented 1 year ago

@selitvin Many thanks for the review. I will update the PR with your suggestions.

s-udhaya commented 1 year ago

@selitvin It took a while for me to update the PR 😃 Please have a look at it whenever you have time. Thanks

s-udhaya commented 1 year ago

@selitvin Thanks for the approval. What is the process now to get this PR to be merged onto the main branch?

selitvin commented 1 year ago

Merged. I need to cut a release now that will include your change.

selitvin commented 1 year ago

I am pushing a release candidate v0.12.1rc0 now. Will release tomorrow.

uber / petastorm

Support results_queue_size parameter in make_batch_reader api #783

Codecov Report