Closed mski-iksm closed 3 years ago
Looks good!
Is there any reason why reset_index should not be used?
I think the code is good, but there is no additional documentation and I don't know the usage scenarios.
@vaaaaanquish
Is there any reason why reset_index should not be used?
I think the code is good, but there is no additional documentation and I don't know the usage scenarios.
If you have feather file which is dumped with FeatherFileProcessor and index processed with reset_index(), there is no way to know which column was index originally. Using IndexedFeatherFileProcessor
By this behavior, you can use IndexedFeatherFileProcessor as like other FileProcessor without considering about index problem.
@mski-iksm I know how it works. But I can't think of a situation where more than two Tasks need index
. Since indexing of pandas is not stable (implicit reset, pandas version,...and more), it is better to use column as the basic method for merging, grouping, ...
There are two reasons why I am cautious about this approach
.indexedfeather
For example, can we consider extending the existing FeatherFileProcessor?
@vaaaaanquish
For example, can we consider extending the existing FeatherFileProcessor?
Yes, I will
@Hi-king @vaaaaanquish I've added test and docs. Please review!
@hirosassa @vaaaaanquish Thank you for reviewing! I'll merge this PR.
[What?] I've added indexed feather file processor which is capable to store index data.
[Why?] The existing FeatherFileProcessor cannot save index information since the index must be 0,1,2,.... when saving as feather.
Please review. @Hi-king @hirosassa @vaaaaanquish @e-mon