m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

add new feature: dump dependent Tasks information as a table #232

Closed mski-iksm closed 3 years ago

mski-iksm commented 3 years ago

I've added a new feature for dumping task cache information file at gokart.build().

Task cache information file is a pandas dataframe which has information such as task name, unique id, cache file path, parameter and processing time of completed tasks.

issue: https://github.com/m3dev/gokart/issues/231

mski-iksm commented 3 years ago

@Hi-king @vaaaaanquish @e-mon @hirosassa Please review.

vaaaaanquish commented 3 years ago

Since these information can be recovered from log that are already there, looks unnecessary feature. Please write usage scenarios in document.

And if you want this Task Infomation log for task management, the place to implement it is in TaskOnKart , and maybe not in gokart.build. In that case, it would be needed for all tasks, so it would look like following. https://github.com/m3dev/gokart/pull/85

If you want this for workflow management, we can be recovered from log that are already there.

mski-iksm commented 3 years ago

@vaaaaanquish

Since these information can be recovered from log that are already there, looks unnecessary feature.

I think you are talking about using gokart.make_tree_info() which is very close feature. However,

so, I think this feature is needed for productivity.

I do agree with _make_task_info_list() duplicated with gokart.make_tree_info(), so I think replacing backend with gokart.make_tree_info(). How do you think?

vaaaaanquish commented 3 years ago

@mski-iksm

I do agree with _make_task_info_list() duplicated with gokart.make_tree_info(), so I think replacing backend with gokart.make_tree_info()

This is very cool. If make_tree_info has the ability to return dataframes, It's a big implement.

mski-iksm commented 3 years ago

@vaaaaanquish I've modified dump_task_info_table to use the same backend as make_tree_info. dump_task_info_table is mostly same with make_tree_info, except it dumps the tree_info as pandas.DataFrame.

During this process, I've refactored make_tree_info and fixed some tests.

mski-iksm commented 3 years ago

I'll close this PR and separate it into following 2 PRs.