jg10545 / patchwork

Active learning for quickly building labeled image patch datasets
MIT License
3 stars 5 forks source link

patchwork

Interactive Machine Learning for an Imperfect World

This project is an experiment on how to leverage machine learning for image classification problems, when the problem may be poorly-specified or evolving. I'm interested in cases where we may not know exactly what we're looking for when we start the problem, but expect that starting to systematically sort through data would help us refine our question. In particular, we want a system for sorting through images that can handle:

If you're going to try this code out- I apologize in advance for the state of the GUI; I'm not really an interface guy. This library is a car with no seatbelts.

What's inside

In the recent SimCLRv2 paper, Chen et al lay out three steps for training a classifier without many labels: task-agnostic unsupervised pretraining of a feature extractor, task-specific supervised fine-tuning on only the labeled data, and finally task-specific semi-supervised learning on all the data. patchwork has tools for all three steps:

Pre-training a feature extractor

The patchwork.feature module has methods for pretraining convolutional networks as feature extractors.

Training a supervised classifier

patchwork contains a graphical user interface using the panel library for interactive labeling. Using a frozen pre-trained feature extractor, iteratively label images, train a fine-tuning model to classify using your labels, then use the model to motivate which images to label next. Save out your classification model directly, or use the pandas.DataFrame of labels in your own workflow.

If you don't want to use my crappy GUI for training a supervised model, you're still welcome to scavenge any pieces you need- the loading functions used for the GUI can also be used with the Keras API.

Semi-supervised fine tuning

This part's pretty unimaginative- starting with the model you trained using patchwork.GUI (or your own model) as a teacher, use patchwork.Distillerator() to train a student model.

patchwork has been tested with tensorflow 2.0.

Installation

use pip

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Images seen in my documentation are from the amazing UC Merced Land Use dataset which is wonderful for prototyping.

.. Cookiecutter: https://github.com/audreyr/cookiecutter .. audreyr/cookiecutter-pypackage: https://github.com/audreyr/cookiecutter-pypackage