ml5js / ml5-library

Friendly machine learning for the web! 🤖
https://ml5js.org
Other
6.38k stars 906 forks source link

How should we handle input data with Nulls/Blanks? #1471

Open salamanders opened 9 months ago

salamanders commented 9 months ago

I was loading a CSV to try the HelloWorld ml5.neuralNetwork, and it threw a lot of errors like the input label YourInputColumn5 does not exist at row 8687

Which was absolutely correct - the training data is littered with blanks. Which is what I'm loading it in to fix - I want to train a classifier and fill in those blanks.

Is there a way to flag in the options "ya, I know that lots of values are missing from lots of columns. That's ok. That is what we are here to fix!"

lindapaiste commented 8 months ago

@salamanders It is a requirement that training data has a classification attached to it. The purpose of the training is for the model to build associations between the input columns and the resulting classification.

You'll want to separate your CSV into two data sets. Those with known classifications will be used for training the model. Then you'll use the trained model to classify the empty ones.

You'll have to make this separation yourself before providing data to the ml5 model.