This project is an experiment on how to leverage machine learning for image classification problems, when the problem may be poorly-specified or evolving. I'm interested in cases where we may not know exactly what we're looking for when we start the problem, but expect that starting to systematically sort through data would help us refine our question. In particular, we want a system for sorting through images that can handle:
pandas
and keras
).If you're going to try this code out- I apologize in advance for the state of the GUI; I'm not really an interface guy. This library is a car with no seatbelts.
In the recent SimCLRv2 paper, Chen et al lay out three steps for training a classifier without many labels: task-agnostic unsupervised pretraining of a feature extractor, task-specific supervised fine-tuning on only the labeled data, and finally task-specific semi-supervised learning on all the data. patchwork
has tools for all three steps:
The patchwork.feature
module has methods for pretraining convolutional networks as feature extractors.
tf.distribute
API are new and still experimental, so this may break in a tensorflow
updatepatchwork
contains a graphical user interface using the panel
library for interactive labeling. Using a frozen pre-trained feature extractor, iteratively label images, train a fine-tuning model to classify using your labels, then use the model to motivate which images to label next. Save out your classification model directly, or use the pandas.DataFrame
of labels in your own workflow.
feature
module.If you don't want to use my crappy GUI for training a supervised model, you're still welcome to scavenge any pieces you need- the loading functions used for the GUI can also be used with the Keras API.
This part's pretty unimaginative- starting with the model you trained using patchwork.GUI
(or your own model) as a teacher, use patchwork.Distillerator()
to train a student model.
patchwork
has been tested with tensorflow
2.0.
use pip
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage
project template.
Images seen in my documentation are from the amazing UC Merced Land Use dataset which is wonderful for prototyping.
.. Cookiecutter: https://github.com/audreyr/cookiecutter
.. audreyr/cookiecutter-pypackage
: https://github.com/audreyr/cookiecutter-pypackage