zjshi / gt-pro

MIT License
23 stars 7 forks source link

refactor multithreading code for readability and gzip performance #27

Closed boris-dimitrov closed 4 years ago

boris-dimitrov commented 4 years ago

the initial implementation worked but was a bit messy and didn't allow for multiple gzip decompression processes to run on different input files, which was limiting performance for gzipped inputs in the aws server environment

this refactor cleans it up using more C++ objects/classes

it also allows up to 8 (configurable) inputs to be gunzip'ed in parallel, which should take care of the performance issue

it would also enable multiple inputs to be fetched from s3 in parallel for higher performance

tested and ready to merge