karpathy / neuraltalk

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
5.4k stars 1.32k forks source link

Running On Raw Images #1

Closed YafahEdelman closed 9 years ago

YafahEdelman commented 9 years ago

How exactly would I go about getting a trained models predicition on an image (in some raw format) that I have?

karpathy commented 9 years ago

Hi Jacob,

It's a little tricky right now but I hope to bridge this gap.

You'd have to use Caffe to extract the top-layer representation from a CNN that looks at the image. If you have Matlab that would be ideal because I provide some skeleton code for it inside matlab_reference folder.

If all you have is Python then right now you'd kind of have to be an expert on CNNs and Caffe. You'd have to use the Python Caffe wrapper to extract CNN features using the VGG 16-layer model and then use these as input to this code. I'll have some code up soon. Sorry about that.

YafahEdelman commented 9 years ago

Thanks, I look foward to seeing the code!

EvanWeiner commented 9 years ago

Hi Andrej,

Sorry to make a forked question here - but why did you choose to use the VGG16 model for feature extraction rather than the BLVC_reference_model/AlexNet model which is shipped in Caffe core? I know the VGG16 has higher accuracy in classification, but for features / neural codes, aren't they similar at earlier layers?

On Dec 3, 2014, at 2:02 PM, Andrej notifications@github.com wrote:

Hi Jacob,

It's a little tricky right now but I hope to bridge this gap.

You'd have to use Caffe to extract the top-layer representation from a CNN that looks at the image. If you have Matlab that would be ideal because I provide some skeleton code for it inside matlab_reference folder.

If all you have is Python then right now you'd kind of have to be an expert on CNNs and Caffe. You'd have to use the Python Caffe wrapper to extract CNN features using the VGG 16-layer model and then use these as input to this code. I'll have some code up soon. Sorry about that.

— Reply to this email directly or view it on GitHub.

karpathy commented 9 years ago

Hi Evan,

the VGG model is significantly better, not just in classification but also in the features it produces (which directly support that classification). The features are probably similar near the first few layers but the final features before the classifier (which are used here) are much stronger with VGG.