This is part 3 of a new feature: data layout optimization library, strategy generation. This PR is co-authored with @anjagruenheid.
Added compaction strategy generation with rewrite cost as serial rewrite time and rewrite gain as time-saving from number of files reduced. This PR builds on top of https://github.com/linkedin/openhouse/pull/109
The following 3 components will be added eventually:
1) DLO library that has primitives for generating data layout optimization strategies
2) App that generates strategies for all tables
3) Scheduling of the app
Changes
[ ] Client-facing API Changes
[ ] Internal API Changes
[ ] Bug Fixes
[x] New Features
[ ] Performance Improvements
[ ] Code Style
[ ] Refactoring
[ ] Documentation
[x] Tests
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
[ ] Manually Tested on local docker setup. Please include commands ran, and their output.
[x] Added new tests for the changes made.
[ ] Updated existing tests to reflect the changes made.
[ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
[ ] Some other form of testing like staging or soak time in production. Please explain.
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
[ ] Breaking Changes
[ ] Deprecations
[ ] Large PR broken into smaller PRs, and PR plan linked in the description.
For all the boxes checked, include additional details of the changes made in this pull request.
Summary
This is part 3 of a new feature: data layout optimization library, strategy generation. This PR is co-authored with @anjagruenheid.
Added compaction strategy generation with rewrite cost as serial rewrite time and rewrite gain as time-saving from number of files reduced. This PR builds on top of https://github.com/linkedin/openhouse/pull/109
The following 3 components will be added eventually: 1) DLO library that has primitives for generating data layout optimization strategies 2) App that generates strategies for all tables 3) Scheduling of the app
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.