microsoft / NimbusML

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.
Other
282 stars 63 forks source link

NimbusML trains model that ML.NET thinks is corrupted #13

Closed GalOshri closed 5 years ago

GalOshri commented 5 years ago
  1. Train a model with NimbusML 0.6 using code below
  2. Save model to disk
  3. Try and use the saved model in ML.NET 0.6

Expected: model can be used in ML.NET Actual: I see the error shown below.

Unhandled Exception: System.FormatException: Corrupt model file
   at Microsoft.ML.Runtime.Model.ModelLoadContext.LoadModel[TRes,TSig](IHostEnvironment env, TRes& result, RepositoryReader rep, String dir, Object[] extra)
   at Microsoft.ML.Runtime.Data.TransformerChain.LoadFrom(IHostEnvironment env, Stream stream)
   at nimbusmlnet.Program.Main(String[] args) in /Users/gal/Projects/NimbusML/nimbusmlnet/Program.cs:line 19

Python training code:

train_datapath = '/Users/gal/Projects/NimbusML/Sent_Train.tsv'
test_datapath = '/Users/gal/Projects/NimbusML/Sent_Test.tsv'
schema = DataSchema.read_schema(train_datapath, sep='\t', numeric_dtype=np.float32)
train_data = FileDataStream.read_csv(train_datapath, sep='\t', schema=schema)
test_data = FileDataStream.read_csv(test_datapath, sep='\t', schema=schema)
print(train_data.schema)

pipeline = Pipeline([
    TakeFilter(10000),
    NGramFeaturizer(word_feature_extractor=Ngram(weighting = 'TfIdf',
                                                             ngram_length=2),
                                char_feature_extractor=Ngram(weighting = 'Tf',
                                                             ngram_length=3),
                                columns = {"Features": "SentimentText"}),
    AveragedPerceptronBinaryClassifier(num_iterations = 10, feature="Features", label="Sentiment")
])
pipeline.fit(train_data)
pipeline.save_model("sent_model.zip")

ML.NET prediction code:

var env = new ConsoleEnvironment();
ITransformer loadedModel;
using (var file = File.OpenRead("../sent_model.zip"))
    loadedModel = TransformerChain.LoadFrom(env, file);

var predictor = loadedModel.MakePredictionFunction<SentimentData, SentimentPrediction>(env);

var prediction = predictor.Predict(new SentimentData
{
    SentimentText = "I am so happy!",
    Sentiment = 0
});
Console.WriteLine(prediction.Probability);
Console.ReadLine();
ganik commented 5 years ago

Currently for scoring you need to use a LegacyPipeline. This is because model is saved in the old format. Changes are required on ML.NET side to save model in new format through entrypoints.

var pipeline = new Legacy.LearningPipeline(); var loadedModel = await Legacy.PredictionModel.ReadAsync<InfertData, InfertPrediction>(modelName); loadedModel.Predict(..)

See scoring example here: https://review.docs.microsoft.com/en-us/nimbusml/loadsavemodels?branch=smoke-test#scoring-in-mlnet

montebhoover commented 5 years ago

Will be solved by this ML.NET issue: https://github.com/dotnet/machinelearning/issues/1104

ganik commented 5 years ago

this is solved