Closed artemiusgreat closed 4 years ago
Hi @artemiusgreat.
Thanks for using SharpLearning, and thanks for opening the issue.
Currently all learners in SharpLearning implements the IPredictor
and IPredictorLearner
interfaces. That is, both Regression models and Classification models share the interface. Using string labels would not be an option for a regression model since the targets are floating point numbers. So inorder to share the interface, classification learners use doubles for targets as well.
Practically, if you need to know the names of the classes the model predicts, you can store the mapping next to the model:
var labels = new[] { "Good", "Bad", "Average" ... };
var labelKeyToLabelName = Enumerable.Range(0, labels.Length)
.ToDictionary(i => (double)i, i => labels[i]);
using (var memoryStream = new MemoryStream())
{
var serializer = new GenericXmlDataContractSerializer();
serialier.Serialize<Dictionary<double, string>>(labelKeyToLabelName , () => new StreamWriter(memoryStream));
db.Save(memoryStream.ToArray()); // convert XML to byte[] and save as Blob to DB
}
If you want to store them together you can also write you own type using string as the prediction type, and then serialize and use that type instead:
[Serializable]
public class ClassificationPredictorModel : IPredictorModel<string>
{
readonly IPredictorModel<double> m_model;
readonly Dictionary<double, string> m_labelValueTolabelName;
public ClassificationModel(IPredictorModel<double> model,
Dictionary<double, string> labelValueTolabelName)
{
m_model = model ?? throw new ArgumentNullException(nameof(model));
m_labelValueTolabelName = labelValueTolabelName ?? throw new ArgumentNullException(nameof(labelValueTolabelName));
}
public double[] GetRawVariableImportance()
{
return m_model.GetRawVariableImportance();
}
public Dictionary<string, double> GetVariableImportance(
Dictionary<string, int> featureNameToIndex)
{
return m_model.GetVariableImportance(featureNameToIndex);
}
public string Predict(double[] observation)
{
var labelValue = m_model.Predict(observation);
return m_labelValueTolabelName[labelValue];
}
public string[] Predict(F64Matrix observations)
{
var predictions = new string[observations.RowCount];
var observation = new double[observations.ColumnCount];
for (int i = 0; i < observations.RowCount; i++)
{
observations.Column(i, observation);
predictions[i] = Predict(observation);
}
return predictions;
}
}
Best regards Mads
Alternatively. if you don't need the variable importance methods, you can use the IPredictor
interface for a simpler implementation:
[Serializable]
public class ClassificationPredictor : IPredictor<string>
{
readonly IPredictor<double> m_model;
readonly Dictionary<double, string> m_labelValueTolabelName;
public ClassificationPredictor(IPredictor<double> model,
Dictionary<double, string> labelValueTolabelName)
{
m_model = model ?? throw new ArgumentNullException(nameof(model));
m_labelValueTolabelName = labelValueTolabelName ?? throw new ArgumentNullException(nameof(labelValueTolabelName));
}
public string Predict(double[] observation)
{
var labelValue = m_model.Predict(observation);
return m_labelValueTolabelName[labelValue];
}
public string[] Predict(F64Matrix observations)
{
var predictions = new string[observations.RowCount];
var observation = new double[observations.ColumnCount];
for (int i = 0; i < observations.RowCount; i++)
{
observations.Column(i, observation);
predictions[i] = Predict(observation);
}
return predictions;
}
}
It works, thanks.
Final version
public class MapModel<TKey, TValue> : IPredictorModel<KeyValuePair<TKey, TValue>>
{
public IDictionary<TKey, TValue> Map { get; set; }
public IPredictorModel<double> Model { get; set; }
public double[] GetRawVariableImportance()
{
return Model.GetRawVariableImportance();
}
public Dictionary<string, double> GetVariableImportance(Dictionary<string, int> featureNameToIndex)
{
return Model.GetVariableImportance(featureNameToIndex);
}
public KeyValuePair<TKey, TValue> Predict(double[] observation)
{
var predictionKey = ConversionManager.Value<TKey>(Model.Predict(observation));
return Map.TryGetValue(predictionKey, out TValue prediction) ? new KeyValuePair<TKey, TValue>(predictionKey, prediction) : default;
}
public KeyValuePair<TKey, TValue>[] Predict(F64Matrix observations)
{
var predictions = new KeyValuePair<TKey, TValue>[observations.RowCount];
var observation = new double[observations.ColumnCount];
for (var i = 0; i < observations.RowCount; i++)
{
observations.Column(i, observation);
predictions[i] = Predict(observation);
}
return predictions;
}
}
Example
var container = new MapModel<int, string>
{
Map = new Dictionary<int, string>{ [1] = "Good", [2] = "Bad" },
Model = new ClassificationDecisionTreeLearner().Learn(Observations, Targets)
};
serializer.Serialize(container, () => new StreamWriter(memoryStream));
var model = serializer.Deserialize<MapModel<int, string>>(() => new StreamReader(memoryStream));
var predictions = model.Predict(processor.Input.Observations);
Side notes
Decided to make both types generic - TKey and TValue to make sure that TKey is an integer type and will not cause normalization issues like in the code below.
Dictionary<double, dynamic> map = new Dictionary<double, dynamic>();
double sourceKey = 0.2; // 0.19999999999999574
double complexKey = 0.1;
complexKey += 0.1; // 0.2
map[sourceKey] = 1;
var v1 = map.ContainsKey(sourceKey) && map.ContainsKey(complexKey);
var v2 = map.TryGetValue(sourceKey, out int a) && map.TryGetValue(complexKey, out int b);
var v3 = map[sourceKey] & map[complexKey]; // Exception, because sourceKey is not equal to complexKey
Debugger results
sourceKey => 0.19999999999999574
complexKey => 0.2
v1 => false
v2 => false
a => 1
b => 0
@artemiusgreat Thanks for adding the additional notes and conclusion. Glad it worked!
First of all, thank you for sharing this library. Second, would be great to make mapping between columns and feature names mode obvious. If model was serialized and saved on one computer and deserialized and loaded on the other one, then second computer will have no idea what's the meaning of labels / targets, because model keeps them as double values.
Save model
Load model
As a result, loaded model has property
Targets
that contains some double values, like 1, 2, 3, 4 and there is no way to understand that initially they meant "Good", "Bad", etcQuestion
Is there a way to save original string labels / targets as a part of the model to make prediction results human-readable?