m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
318 stars 57 forks source link

add indexedfeather file processor #223

Closed mski-iksm closed 3 years ago

mski-iksm commented 3 years ago

[What?] I've added indexed feather file processor which is capable to store index data.

[Why?] The existing FeatherFileProcessor cannot save index information since the index must be 0,1,2,.... when saving as feather.

Please review. @Hi-king @hirosassa @vaaaaanquish @e-mon

hirosassa commented 3 years ago

Looks good!

vaaaaanquish commented 3 years ago

Is there any reason why reset_index should not be used?

I think the code is good, but there is no additional documentation and I don't know the usage scenarios.

mski-iksm commented 3 years ago

@vaaaaanquish

Is there any reason why reset_index should not be used?

I think the code is good, but there is no additional documentation and I don't know the usage scenarios.

If you have feather file which is dumped with FeatherFileProcessor and index processed with reset_index(), there is no way to know which column was index originally. Using IndexedFeatherFileProcessor

By this behavior, you can use IndexedFeatherFileProcessor as like other FileProcessor without considering about index problem.

vaaaaanquish commented 3 years ago

@mski-iksm I know how it works. But I can't think of a situation where more than two Tasks need index . Since indexing of pandas is not stable (implicit reset, pandas version,...and more), it is better to use column as the basic method for merging, grouping, ...

There are two reasons why I am cautious about this approach

vaaaaanquish commented 3 years ago

For example, can we consider extending the existing FeatherFileProcessor?

mski-iksm commented 3 years ago

@vaaaaanquish

For example, can we consider extending the existing FeatherFileProcessor?

Yes, I will

mski-iksm commented 3 years ago

@Hi-king @vaaaaanquish I've added test and docs. Please review!

mski-iksm commented 3 years ago

@hirosassa @vaaaaanquish Thank you for reviewing! I'll merge this PR.