@kamil-kaczmarek, @jakubczakon I know it is a bunch of different ideas and suggestions clustered in one issue. Let me know which of those are compatible with the current roadmap. (I am happy to contribute/collaborate on some.)
default data folder (e.g. ./.steppy/step_name/) or to be configurable if needed; overriding only when strictly necessary
no input_data; it complicates things for no obvious reason!
names optional, automatically generated from class names + number
more explicit job structure (steps = Sequence([step1, step2])); vide Keras API
adapters as inheriting from BaseTrainers,step = Rename({'a': 'aaa', 'b': 'bbb'}), vide rename in Pandas
how to separate persist-data vs persist-parameters? (e.g. for image preprocessing, it may be time-saving to save once processed images)
built-in data tests (e.g. len(X) == len(Y)), in def test
built-in test if persist->load is correct (i.e. loaded data is the same as saved)
explicit job structure. Sounds great not sure how complicated it is to contruct
Could-be's
drop input_data would create a need for input_data step I guess. I don't mind the idea but gotta see it work first.
Rename is a good idea but remember that there could be multiple steps with the same output key that are joined somewhere so it will be more complicated than what you suggested. I would love to improve the adapter structure though.
Don't-get-it's
persist-data is different than persist-parameters for exactly that reason
persist>load can you elaborate?
I am always for any tests could you explain what you mean by those data tests?
@kamil-kaczmarek, @jakubczakon I know it is a bunch of different ideas and suggestions clustered in one issue. Let me know which of those are compatible with the current roadmap. (I am happy to contribute/collaborate on some.)
./.steppy/step_name/
) or to be configurable if needed; overriding only when strictly necessaryinput_data
; it complicates things for no obvious reason!steps = Sequence([step1, step2])
); vide Keras APIstep = Rename({'a': 'aaa', 'b': 'bbb'})
, vide rename in Pandaslen(X) == len(Y)
), indef test