linkedin / dagli

Framework for defining machine learning models, including feature generation and transformations, as directed acyclic graphs (DAGs).
BSD 2-Clause "Simplified" License
354 stars 40 forks source link

Loading a trained network results in completely different probabilities on same test data #6

Closed cyberbeat closed 3 years ago

cyberbeat commented 3 years ago

I trained a "NeuralNetwork" with GPU, serialized the prepared network, loaded it on another PC and did inference with CPU with the same test data. The probabilities are completely different, appearing like untrained.

I have no explanation for this. The only bug I found on dl4j side, which maybe could relate to this would be: https://github.com/eclipse/deeplearning4j/issues/4688

Could this be possible?

jeffpasternack commented 3 years ago

Thanks for reporting this.

It's definitely possible for the results to be (slightly) different between GPU and CPU due to differences in floating point math, but it shouldn't affect them substantively. You're right that this is almost certainly a bug in DL4J--the linked issue is old enough that they should have fixed it, but we can attempt the same workaround.

Could you please share, broadly, the type architecture you used? E.g. multilayer perceptron or RNN? This may impact reproducibility.

cyberbeat commented 3 years ago

Some fasttext-processing-as-boolean-distribution + lstm + categorical + number-features, followed by a multilayer perceptron and boolean classification.

jeffpasternack commented 3 years ago

Thanks! We'll try to reproduce in the next couple of days and will update the ticket with our findings.

jeffpasternack commented 3 years ago

Hi @cyberbeat, so far we haven't been able to reproduce this with our demo LSTM and MLP models. It may be platform-specific; can you please confirm that you're seeing this issue on Linux?

cyberbeat commented 3 years ago

yes, Linux x86_64. Perhaps I can make a small example (but that may take some time)

jeffpasternack commented 3 years ago

Thanks! It'd be surprised if this was tied to a specific model configuration (DL4J should be doing weight tensor I/O in a consistent fashion)--please first let me see if we can reproduce on a Linux box specifically using our existing demo models.

jeffpasternack commented 3 years ago

Updating on this: no luck trying to reproduce with a model trained on WinX64 GPU with Linux CPU inference; testing on a CUDA-capable Linux instance (that also accommodates convenient debug tools) will take a bit longer but is the next step.

cyberbeat commented 3 years ago

I now tried it on new hardware (cuda) with training and inference on the same machine. Same problem, perhaps I do some mistakes?

Training (simplified):

` NNClassification classification = new NNClassification() .withFeaturesInput(denseLayers) .withBinaryLabelInput(p.asIsMatch()); NeuralNetwork neuralNetwork = new NeuralNetwork()....;

LabelProbabilityFromDistribution isMatch = new LabelProbabilityFromDistribution().withDistributionInput(neuralNetwork.asLayerOutput(classification)).withLabel(true);

DAG1x1<MyPlaceholder, Double> dag = DAG.withPlaceholder(p).withOutput(isMatch);

MyPlaceholderIterable train = new MyPlaceholderIterable(train...); MyPlaceholderIterable test = new MyPlaceholderIterable(test...);

DAG1x1.Prepared<MyPlaceholder, Double> res = dag.prepare(train);

ObjectReader predicted = res.applyAll(test);

..some successful output to compare later..

try (ObjectOutputStream oos = new ObjectOutputStream(Files.newOutputStream(Path.of("model")))) { oos.writeObject(res); } `

And inference:

` try (ObjectInputStream ois = new ObjectInputStream(Files.newInputStream(Path.of(model)))) { dag = ((DAG1x1.Prepared<MyPlaceholder, Double>)) ois.readObject(); }

DAG1x1.Prepared.Result res = dag.applyAll(test);

..output different than on training run (with useless/bad results, most items same low score)... `

jeffpasternack commented 3 years ago

We tried to replicate this with an MLP (CUDA trained, CPU inference) and RNN (both CUDA trained & CuDNN trained, CPU inference) on an x64 Ubuntu machine (CUDA 10.1) with no success.

The only substantive difference I see between your example code and our MLP example is that you are using a binary label and we are doing multinomial classification. I'll try again with a binary label and will update the ticket with the result.

cyberbeat commented 3 years ago

Sorry for this report. I now finally found, that this was my fault, because I failed to load the test-data the same way on inference.

jeffpasternack commented 3 years ago

Glad you were able to resolve it!