Closed dsmilkov closed 2 years ago
Hi, I'd like to give this a shot!
@oreHGA Any update on this? I would really like to use this saving ability for saving the knn-classifier information returned by classifier.getClassifierDataset(). As far as I can tell it would just be saving an array of 2D tensors for which each tensor has a different shape. Anyone active on this?
@dsmilkov To save the knn-classifier information (array of 2D Tensors) could we store the array of tensors as a fake model? That would allow us to use model.save() and tf.loadModel() without having to make any changes.
I am just stuck on which type of keras layer would allow different tensor shapes (dense layers don't work as each layer gets it's shape from the previous layer), or would I have to make a custom model with multiple inputs (I can fully define the shape of an input). Each input having the different shape.
See my Face Detection demo for which every classifier has the 2D Tensor shape [x,136] where X is the number of images that have been classified for that person and 136 is the amount of data generated for each face (68 data points x 2 = 136).
So my idea of making a fake model with multiple inputs to save each tensor from the knn-classifier does work.
Basically it takes the output from the classifier.getClassifierDataset()
which returns an array of tensors, then sets up a model with inputs and a dense layer for each trained classifier. That model can be saved using model.save()
The saved model can then be loaded from a website and converted back into the classifier using classifier.setClassifierDataset()
once the array of tensors have been extracted from the fake model.
Reply if you are interested in the code. I am still working on making the code a bit more readable. I had to first understand multiple inputs and here is a demo for that
@hpssjellis Thx!
Took me a while to find the code. Here is the knn-classifier demo. Look for the buttons "save-classifier" and "load-classifier". You can just view-source or visit the github.
The github is here
This is my simply implement for save an load knn and it works, hope can help anyone who need :)
save() {
let dataset = this.classifier.getClassifierDataset()
var datasetObj = {}
Object.keys(dataset).forEach((key) => {
let data = dataset[key].dataSync();
// use Array.from() so when JSON.stringify() it covert to an array string e.g [0.1,-0.2...]
// instead of object e.g {0:"0.1", 1:"-0.2"...}
datasetObj[key] = Array.from(data);
});
let jsonStr = JSON.stringify(datasetObj)
//can be change to other source
localStorage.setItem("myData", jsonStr);
}
load() {
//can be change to other source
let dataset = localStorage.getItem("myData")
let tensorObj = JSON.parse(dataset)
//covert back to tensor
Object.keys(tensorObj).forEach((key) => {
tensorObj[key] = tf.tensor(tensorObj[key], [tensorObj[key].length / 1000, 1000])
})
this.classifier.setClassifierDataset(tensorObj);
}
I would find this useful as well. Based on @leung85 and @hpssjellis examples I've created a typescipt and async version:
import * as knnClassifier from "@tensorflow-models/knn-classifier";
import * as tf from '@tensorflow/tfjs';
type Dataset = {
[classId: number]: tf.Tensor<tf.Rank.R2>
};
type DatasetObjectEntry = {
classId: number,
data: number[],
shape: [number, number]
};
type DatasetObject = DatasetObjectEntry[];
async function toDatasetObject(dataset: Dataset): Promise<DatasetObject> {
const result: DatasetObject = await Promise.all(
Object.entries(dataset).map(async ([classId,value], index) => {
const data = await value.data();
return {
classId: Number(classId),
data: Array.from(data),
shape: value.shape
};
})
);
return result;
};
function fromDatasetObject(datasetObject: DatasetObject): Dataset {
return Object.entries(datasetObject).reduce((result: Dataset, [indexString, {data, shape}]) => {
const tensor = tf.tensor2d(data, shape);
const index = Number(indexString);
result[index] = tensor;
return result;
}, {});
}
const storageKey = "knnClassifier";
async function saveClassifierInLocalStorage(classifier: knnClassifier.KNNClassifier) {
const dataset = classifier.getClassifierDataset();
const datasetOjb: DatasetObject = await toDatasetObject(dataset);
const jsonStr = JSON.stringify(datasetOjb);
//can be change to other source
localStorage.setItem(storageKey, jsonStr);
}
function loadClassifierFromLocalStorage(): knnClassifier.KNNClassifier {
const classifier: knnClassifier.KNNClassifier = new knnClassifier.KNNClassifier();
const datasetJson = localStorage.getItem(storageKey);
if (datasetJson) {
const datasetObj = JSON.parse(datasetJson) as DatasetObject;
const dataset = fromDatasetObject(datasetObj);
classifier.setClassifierDataset(dataset);
}
return classifier;
}
That's great @oveddan, any chance of that being sent as a PR to TFJS or is it too specific and should just be loaded as needed?
@leung85 newer versions use tensor size of 1024
Maybe it's obvious, but you can read the shape from the tensor object, so no need to hard-code in 1000 or 1024, etc. Based on leung85's example (sorry for long lines):
// Create your classifier:
let classifier = knnClassifier.create();
// Add some examples:
classifier.addExample(...);
// Save it to a string:
let str = JSON.stringify( Object.entries(classifier.getClassifierDataset()).map(([label, data])=>[label, Array.from(data.dataSync()), data.shape]) );
// Load it back into a fresh classifier:
classifier = knnClassifier.create();
classifier.setClassifierDataset( Object.fromEntries( JSON.parse(str).map(([label, data, shape])=>[label, tf.tensor(data, shape)]) ) );
@oveddan I tried to do it as in your answer but after fromDatasetObject
my data-set has undefined
in classIndex
. Could you please provide any advice on how to fix it?
@oveddan I tried to do it as in your answer but after
fromDatasetObject
my data-set hasundefined
inclassIndex
. Could you please provide any advice on how to fix it?
@VladimirHumeniuk Hi! Have you been able to fix this? I am having the same issue.
@Mxlt I just used label
instead of classIndex
Noticed most of these answers are synchronous, which is potentially dangerous if you're expecting to unload a large dataset, so I have created a library for parsing and stringifying datasets of these types. If you are interested, take a look at tensorset. There is some documentation on how to use it, it works similar to JSON.stringify
and JSON.parse
.
Here is an example of using Tensorset
with the KNN-Classifier:
const fs = require('fs').promises;
const Tensorset = require('tensorset');
(async () => {
// Create a classifier, add your examples
const originalClassifier = knnClassifier.create();
originalClassifier.addExample(/*Some Example*/);
// Stringify the dataset
let dataset = Tensorset.stringify(originalClassifier.getClassifierDataset());
// Save the dataset
await fs.writeFile(/*File Name*/, dataset);
// Load the dataset
dataset = await fs.readFile(/*File Name*/);
// Parse the dataset
dataset = await Tensorset.parse(dataset);
// Add to a new classifier
const newClassifier = knnClassifier.create();
newClassifier.setClassifierDataset(dataset);
})();
OR if your looking to build a image classifier from the knnClassifier you could use my image-classifier, which implements the save and load functionality, adding images as examples, and then classifying images. Slightly easier to implement imo.
@swimauger This is really great, thank you for sharing these packages! I was wondering for the image-classifier package, does it use MobileNet under the hood for feature extraction and the package you provide simply wrapping the mobilenet + KnnClassifier so it can take a jpg/png image as input?
@JJwilkin precisely!
Maybe it's obvious, but you can read the shape from the tensor object, so no need to hard-code in 1000 or 1024, etc. Based on leung85's example (sorry for long lines):
// Create your classifier: let classifier = knnClassifier.create(); // Add some examples: classifier.addExample(...); // Save it to a string: let str = JSON.stringify( Object.entries(classifier.getClassifierDataset()).map(([label, data])=>[label, Array.from(data.dataSync()), data.shape]) ); // Load it back into a fresh classifier: classifier = knnClassifier.create(); classifier.setClassifierDataset( Object.fromEntries( JSON.parse(str).map(([label, data, shape])=>[label, tf.tensor(data, shape)]) ) );
Thx. It works for me.
We should add save() and load() methods to KnnClassifier. They can take the same url/path format as model.save().
See discussion for motivation and context.
Internally we can make an empty model with non-trainable weights and use the existing model.save() infrastructure to save it.
Long term, we should have a generic tf.save(), tf.load() method that can take an dict of tensor names to tensors and save them.
cc @tafsiri @caisq in case I missed something from our discussion.