Inconsistent output of models created in Digits loaded into Deep-detect

rperdon commented 7 years ago

Models such as alexnet, googlenet, and vgg16 trained and exported by nvidia Digits will output very different results than from within Digits or from external python scripts using the models. This is a similar problem to the nsfw yahoo model producing output that is not consistent.

Alexnet, Googlenet, VGG16 models trained in recent digits version. According to nvidia, the differences are with version controls for supported software, it is in effect the same as Caffe. https://github.com/NVIDIA/DIGITS/issues/145

Files were put into a folder shared into deepdetect:

This was added to the deploy.prototext:

name: "AnimeGN" layer { name: "googlenet" type: "MemoryData" top: "data" top: "label" memory_data_param { batch_size: 32 channels: 3 height: 224 width: 224 } }

Deploy.prototext:

deploy.prototxt.zip

Curl load:

curl -X PUT "http://127.0.0.1:9999/services/animeGN" -d "{\"mllib\":\"caffe\",\"description\":\"image anime or not\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"nclasses\":2}},\"model\":{\"repository\":\"/models/AnimeGN/\"}}"

Deep-Detect output when loading the model.

deepdetect.log.zip

Request call to deepdetect:

{"service":"animeGN","parameters":{"input":{"width":224,"height":224},"output":{ "best":2},"mllib":{"gpu":false}},"data":["/images/index.jpg"]}

deepdetect | INFO - 13:16:56 - Thu Oct 19 13:16:56 2017 UTC - 10.4.10.51 "POST /predict" animeGN 200 1945

image is classified, but the results are not consistent with an external installation of caffe making use of an exported digits model.

beniz commented 7 years ago

As a 'feeling lucky' potential quick fix, you may want to try adding:

transform_param {
  scale: 0.004
}

to the MemoryData layer.

beniz commented 7 years ago

Another related thought is that your network is very likely to come either with a mean.binaryproto file, either with a 3-D array of mean values (e.g. [127,138,117]). You may want to make sure you have that as well.

beniz commented 7 years ago

The related DIGITS classification code appears to be located here: https://github.com/NVIDIA/DIGITS/blob/5174621bac8f4d079e0d9abe75e872b49415782e/examples/classification/example.py

You may want to try predicting with the script above and make sure you have the correct results there.

rperdon commented 7 years ago

I have tested with the classification code already. The results there are consistent with what I get from digits.

I will try adding the line provided as indicated.

I can confirm the existance of the mean.binaryproto file in the folder.

rperdon commented 7 years ago

Adding that line into the MemoryData caused ALL classifications to swing one way only. Still not working. Appreciate the quick response.

rperdon commented 7 years ago

Any other suggestions on possibly manipulating the input vis the deploy.prototext or something prior to submitting the image to deepdetect for classification?

beniz commented 7 years ago

I have tested with the classification code already. The results there are consistent with what I get from digits.

Then you can use the script to find the discrepancy: DD uses OpenCV to load the images, so you can modify the load_image function in the script to use OpenCV and see what you get. This is to verify that this is not again PIL doing something as already observed by https://github.com/beniz/deepdetect/issues/325#issuecomment-310748273

rperdon commented 7 years ago

I am downloading the yahoo nsfw model and installing the required packages. My original solution was to setup an apache server to handle http requests for classification, but some colleagues pointed me to deep detect as a quicker way to import models. By writing an additional script with additional dependencies to preload/transform the submitted image, this has pushed us away from using deep-detect as a solution for ease of use. Will there be support in future to add some sort of alternate load-image function parameters for deep detect in future?

beniz commented 7 years ago

Will there be support in future to add some sort of alternate load-image function parameters for deep detect in future?

I don't think so.

The best solution is to train with DD directly ;) Personnally, I don't trust PIL too much.

Please close the issue if not relevant to you anymore. Thanks!

beniz commented 7 years ago

By writing an additional script with additional dependencies to preload/transform the submitted image

I believe you may have misunderstood my comment https://github.com/beniz/deepdetect/issues/356#issuecomment-338020905. My recommendation is for you to modify the classify.py script in order to find the culprit, e.g. OpenCV vs PIL. Before the discrepancy in the input (or elsewhere) is found, we can't help crafting a potential solution / workaround.

We can help you in your investigation when you're stuck. I recommend using https://github.com/beniz/deepdetect/issues/325 as an example, where @cchadowitz-pf and I did collaborate there and on a gitter chat on finding the origin of the problem.

You can PM me on gitter as needed while carrying your investigation.

rperdon commented 7 years ago

I will do some debugging this week to see if the problem works similarly to the yahoo problem.

cchadowitz commented 7 years ago

I'm also lurking here in case something turns up to either validate or disprove the result from #325 :) But if I can help let me know (I'm also in the gitter channel for DeepDetect).

beniz commented 7 years ago

@cchadowitz-pf Hi, I was expecting you to close #325 in fact :) What does it boil down to, finding how PIL differs from other image loading libraries ? If yes, I let you guys report on the differences with OpenCV, and once we understand what's going on, we'll be more informed in order to possibly add a compatible image loading scheme to DD's input image connector. Let me know if I've missed something.

rperdon commented 7 years ago

cchadowitz-pf - Do you have an e-mail I can reach you at?

cchadowitz commented 7 years ago

@beniz Sorry, I think I left it open simply because while it appeared we had finalized it, it wasn't as conclusive as I'd have liked. I can go ahead and close it though.

It's been a little while since I've thought about it, but yes, I used your suggestions to track how the values changed between loading them from disk into memory and through the neural net layers. As I said in #325 it appeared that the method of loading the image across the various methods/libraries seemed to account for the differences, with PIL seeming to be the outlier. If @rperdon confirms that in this case too, that would be decent evidence. I haven't looked into PIL directly to see what it may be doing differently, though.

@rperdon - I'll give you my email address in a PM on gitter.

jolibrain / deepdetect

Inconsistent output of models created in Digits loaded into Deep-detect #356