petl-developers / petl

Python Extract Transform and Load Tables of Data
MIT License
1.22k stars 190 forks source link

Fix fromdicts generator support lazy #626

Closed arturponinski closed 1 year ago

arturponinski commented 1 year ago

This PR has the objective of improving the support of generators in fromdicts. The current implementation uses itertools.tee which according to docs and production deployments uses large amounts of memory, leading to out of memory kills of processes. This PR aims to keep the improved support of generators by using a filecache, similar to sorting, to allow multiple iterations.

Closes https://github.com/petl-developers/petl/issues/618

Changes Moved _iterchunk from sorts to petl.util.base. Imported to sorts as _iterchunk for BC

Changes

  1. Refactored DictsGeneratorView in petl.io.json to use file cache in a lazy manner, dumping data during the first requested pass

Checklist

Use this checklist for assuring the quality of pull requests that include new code and or make changes to existing code.

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build 2880829446


Changes Missing Coverage Covered Lines Changed/Added Lines %
petl/io/json.py 46 48 95.83%
<!-- Total: 95 97 97.94% -->
Totals Coverage Status
Change from base Build 2692922856: 0.04%
Covered Lines: 12713
Relevant Lines: 13956

💛 - Coveralls