m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

Added should_dump_supplementary_log_files option #291

Closed mski-iksm closed 2 years ago

mski-iksm commented 2 years ago

[What] TaskOnKart.should_dump_supplementary_log_files is an option to control whether to dump supplementary files (task_log, random_seed, task_params, processing_time, module_versions) or not. Default is True, which means to dump as before.

[Why] When each task runs, they will make 1 or more output files as defined in task.run(). Besides, 5 supplementary files (task_log, random_seed, task_params, processing_time, module_versions) will also be dumped. When using cloud data storage and running a large number of tasks, number of files will affect the price for data processing. This option will skip the dumping process of supplementary files to keep the price low.

Please review.

mski-iksm commented 2 years ago

@hirosassa Thanks for the comment. I've added a document!

hirosassa commented 2 years ago

@mski-iksm Thanks again for your contribution!