kungfuai / kaishi

Tool kit to accelerate exploratory data analysis and data cleaning
https://kaishi.readthedocs.io/en/latest/
MIT License
11 stars 2 forks source link

Pipeline config redo #9

Closed mwharton3 closed 4 years ago

mwharton3 commented 4 years ago

A review on this from anyone would be great! @maxisawesome @zzsi @spencerR1992 @jerryschirmer @rgreenjr or whoever...

The big change here is that a pipeline elements (filters, transformations, and labelers) are classes that take aa single input: the dataset object. Any parameters needed should be stored in the dataset object itself.

Some definitions: Filters: operations that remove data elements based on some criteria Transformations: operations that either change a data point itself or meta-data (e.g. labels) of a data point Labelers: operations that add new labels based on criteria developed in the operation

To test on a small directory of images, try:

from kaishi.image import Dataset
imd = Dataset('/path/to/images')
imd.configure()

Follow the steps, and then:

imd.run()
rgreenjr commented 4 years ago

I highly recommend integrating the black formatter into your editors and setting it to auto-format on save.

It will automatically fix most of the nits mentioned above and eliminate code style issues from code reviews.

Once you've used it for a few projects you will never go back. 👍

mwharton3 commented 4 years ago

I highly recommend integrating the black formatter into your editors and setting it to auto-format on save.

It will automatically fix most of the nits mentioned above and eliminate code style issues from code reviews.

Once you've used it for a few projects you will never go back. 👍

Added!

mwharton3 commented 4 years ago

Will come back to unresolved points w/ a separate issue. Merging