petl-developers / petl

Python Extract Transform and Load Tables of Data
MIT License
1.24k stars 193 forks source link

Fix generator support in fromdicts - use file cache instead of iterto… #625

Closed arturponinski closed 2 years ago

arturponinski commented 2 years ago

…ols.tee

This PR has the objective of improving the support of generators in fromdicts. The current implementation uses itertools.tee which according to docs and production deployments uses large amounts of memory, leading to out of memory kills of processes. This PR aims to keep the improved support of generators by using a filecache, similar to sorting, to allow multiple iterations.

Closes https://github.com/petl-developers/petl/issues/618

Changes

  1. Refactored DictsGeneratorView in petl.io.json to use file cache
  2. Moved _iterchunk from sorts to petl.util.base. Imported to sorts as _iterchunk for BC

Checklist

Use this checklist for assuring the quality of pull requests that include new code and or make changes to existing code.

coveralls commented 2 years ago

Pull Request Test Coverage Report for Build 2693186888


Changes Missing Coverage Covered Lines Changed/Added Lines %
petl/io/json.py 34 35 97.14%
<!-- Total: 78 79 98.73% -->
Totals Coverage Status
Change from base Build 2692922856: 0.03%
Covered Lines: 12689
Relevant Lines: 13931

💛 - Coveralls
arturponinski commented 2 years ago

@juarezr I've updated the docstring for fromdicts, please re-review this. Once you accept I'd like to ship it :tada:

juarezr commented 2 years ago

@arturponinski,

Nice solution. If you need any help, please don't be afraid of contacting me.

juarezr commented 2 years ago

@arturponinski,

Nice solution. If you need any help, please don't be afraid of contacting me.

@arturponinski,

I didn't understand if you consider this PR ready to merge. I don't have any objections if you want to merge and ship it.

arturponinski commented 2 years ago

@juarezr this took a while. I had asked for some more reviews and discussion about this solution and ended up not entirely happy with it. Dumping the whole generator to the file before yielding any data was considered as a faulty solution. Done some more trials and I've finally created a solution that I am happy with: https://github.com/petl-developers/petl/pull/626

juarezr commented 2 years ago

@arturponinski,

Can we close this as #626 superseded it and it was merged?

arturponinski commented 2 years ago

Definitely to be closed, this one is obsolete now