koaning / clumper

A small python library that can clump lists of data together.
https://koaning.github.io/clumper/
MIT License
147 stars 15 forks source link

Split Clumper class by functionality #86

Open samarpan-rai opened 3 years ago

samarpan-rai commented 3 years ago

Is your feature request related to a problem? Please describe. Our main class is becoming a monolith. Currently Clumper class is over 1500 lines. The major contributor is the documentation but it still makes it difficult to navigate through while developing.

Describe the solution you'd like Split the class into multiple methods and/or classes. I think we already have a good structure based on tests by functionality. For example, we can split into following system by functionality

koaning commented 3 years ago

Normally, you would consider subclassing because parts can be re-used. Scikit-learn has a ClassifierMixin so that all classifiers are consistent for example.

The thing with this package though ... it's really just a single class. Splitting it up feels like it would add complexity instead of removing it. There's something to be said to perhaps group methods together. But I don't know of a clean way to split up a single class across multiple files.

samarpan-rai commented 3 years ago

Yes, I can see that sub-classing may not be ideal but what if we split the methods based on their functionality. I am not too sure about about Mixins yet.

However, we can split the reading functions in a different file such that we can call them without instantiating the class. I think this also touches on issue #81. This is what pandas does when it loads as seen in their dunder init.py.

import clumper

clump = clumper.read_csv(...) # holds for other read functions

Just moving the read functions can remove 300+ lines from the Clumper class.