Add support for "drawing your own decision boundry" to implement machine teaching

Hellisotherpeople commented 3 years ago

I love this tool. I've been using Bokeh along with UMAP/Ivis/PCA and clustering for dataset visualization like this for awhile - but I am happy to see someone automate this exact use-case since I've had to hand-roll this kind of tool for my own clustering / dimensionality reduction projects many times.

I think the logical extension to a tool like this is allowing someone to define their own decision boundary of a supervised model (they call this "machine teaching" rather than machine learning). Defining their own decision boundary should end up with them having a supervised classifier at the end and being able to visualize how that classifier operates (and ideally allowing an expert human to "tune" it). Note that this is different than the current "select aspects of the dataset by drawing" functionality built in.

One easy way to implement this is to allow the user to "draw" like you do earlier - but then making it where the user is actually drawing a "psudo-subset" (but is actually creating new data) of their initial data. Fit the classified model on this "psudo-subset", and it should end up training fast and giving the user some kind of "equation" (e.g if you choose linear models) or some other interpretation mechanism (e.g. decision trees). When the expert changes bits of how this supervised model works - the model equation or interpretation should update. No need to do CV since it's human eye-balls giving you your regularization for you.

It's a lot of work but I anticipate that if you implement it correctly you'd be well into the thousands of github stars because it's fking obvious but is a huge win in situations where say, a doctor may in fact be capable of "fixing" erroneous parts of a medical imaging AIs decision boundary.

phurwicz commented 3 years ago

Thank you and I love this feedback! Would you mind helping me understand the suggestion better?

Previously I could think of two ways of drawing decision boundaries:

(A) a direct way like in human-learn where the classifier literally follows the polygon (or any shape) you draw;
(B) an indirect way like currently in hover where you draw annotations and have a custom-architecture classifier fit to the annotations. Specifically, the active_learning recipe tries to learn the decision boundary given by the “train” set in an iterative “draw-and-retrain” process.
- What I like about this is that one can make annotations from different views and easily combine them. The “manifold trajectory” slider of the active_learning recipe tries to interpolate between the input manifold and output manifold, giving multiple views to exploit.

Just to be sure, my point of reference is the latest version of hover (0.5.0). Let me know whether you are suggesting (A) or something else :)

phurwicz commented 3 years ago

I think the logical extension to a tool like this is allowing someone to define their own decision boundary of a supervised model (they call this "machine teaching" rather than machine learning). Defining their own decision boundary should end up with them having a supervised classifier at the end and being able to visualize how that classifier operates (and ideally allowing an expert human to "tune" it). Note that this is different than the current "select aspects of the dataset by drawing" functionality built in.

Now that I think more about it, hover.recipes.active_learning achieves “machine teaching” through hover.core.neural.VectorNet, where one can attach “any” neural network (subject to matching dimensions with the vectorizer) after the vectorizer function.

So when starting from scratch, one can use active_learning to draw decision boundaries through annotations and (re)train.

When working an existing model which may not be VectorNet, I suggest first deciding which layers of the model to freeze and which layers to tune. Then you can convert to VectorNet by wrapping the frozen part in vectorizer component and put the tunable part in the neural net component.

Speaking of this, it’s worth considering to implement utility methods for converting VectorNet from/to “pure” PyTorch when applicable (i.e. when the vectorizer is essentially a preprocessor function followed by the forward() of some nn.Module).

Does this seem on the right track?

phurwicz / hover

Add support for "drawing your own decision boundry" to implement machine teaching #18