Open Hellisotherpeople opened 3 years ago
Thank you and I love this feedback! Would you mind helping me understand the suggestion better?
Previously I could think of two ways of drawing decision boundaries:
human-learn
where the classifier literally follows the polygon (or any shape) you draw;hover
where you draw annotations and have a custom-architecture classifier fit to the annotations. Specifically, the active_learning
recipe tries to learn the decision boundary given by the “train” set in an iterative “draw-and-retrain” process.
active_learning
recipe tries to interpolate between the input manifold and output manifold, giving multiple views to exploit. Just to be sure, my point of reference is the latest version of hover
(0.5.0). Let me know whether you are suggesting (A) or something else :)
I think the logical extension to a tool like this is allowing someone to define their own decision boundary of a supervised model (they call this "machine teaching" rather than machine learning). Defining their own decision boundary should end up with them having a supervised classifier at the end and being able to visualize how that classifier operates (and ideally allowing an expert human to "tune" it). Note that this is different than the current "select aspects of the dataset by drawing" functionality built in.
Now that I think more about it, hover.recipes.active_learning
achieves “machine teaching” through hover.core.neural.VectorNet
, where one can attach “any” neural network (subject to matching dimensions with the vectorizer) after the vectorizer function.
So when starting from scratch, one can use active_learning
to draw decision boundaries through annotations and (re)train.
When working an existing model which may not be VectorNet
, I suggest first deciding which layers of the model to freeze and which layers to tune. Then you can convert to VectorNet
by wrapping the frozen part in vectorizer component and put the tunable part in the neural net component.
VectorNet
from/to “pure” PyTorch when applicable (i.e. when the vectorizer is essentially a preprocessor function followed by the forward()
of some nn.Module
).Does this seem on the right track?
I love this tool. I've been using Bokeh along with UMAP/Ivis/PCA and clustering for dataset visualization like this for awhile - but I am happy to see someone automate this exact use-case since I've had to hand-roll this kind of tool for my own clustering / dimensionality reduction projects many times.
I think the logical extension to a tool like this is allowing someone to define their own decision boundary of a supervised model (they call this "machine teaching" rather than machine learning). Defining their own decision boundary should end up with them having a supervised classifier at the end and being able to visualize how that classifier operates (and ideally allowing an expert human to "tune" it). Note that this is different than the current "select aspects of the dataset by drawing" functionality built in.
One easy way to implement this is to allow the user to "draw" like you do earlier - but then making it where the user is actually drawing a "psudo-subset" (but is actually creating new data) of their initial data. Fit the classified model on this "psudo-subset", and it should end up training fast and giving the user some kind of "equation" (e.g if you choose linear models) or some other interpretation mechanism (e.g. decision trees). When the expert changes bits of how this supervised model works - the model equation or interpretation should update. No need to do CV since it's human eye-balls giving you your regularization for you.
It's a lot of work but I anticipate that if you implement it correctly you'd be well into the thousands of github stars because it's fking obvious but is a huge win in situations where say, a doctor may in fact be capable of "fixing" erroneous parts of a medical imaging AIs decision boundary.