torch / nngraph

Graph Computation for nn
Other
299 stars 96 forks source link

Unexpected behavior in saving network as :float() #121

Closed mbchang closed 8 years ago

mbchang commented 8 years ago

Please see the issue posted in https://github.com/torch/torch7/issues/711. I had accidentally submitted an issue for torch/nngraph there, and I'm not sure how to remove it. Thank you!

soumith commented 8 years ago

before loading the model, just execute the line: require 'cunn'

mbchang commented 8 years ago

@soumith I do realize that if my computer has gpu capabilites, executing require cunn lets me load the checkpoint, as mentioned in my post. However, I intend to load the checkpoint on a computer that does not have gpu capabilities, for which I can't install cunn in the first place. That point aside, I had expected that casting the network to Float type would remove any need for cuda dependencies.

fmassa commented 8 years ago

@soumith There is indeed a problem here. I think the guilty guy is in this line. When we call type, the buffers are all cleared, but the references of the old forward inputs are still there. The same might apply for the backward nodes.

Two workarounds for the moment: after converting to float and before saving, either do

Another consequence is that the model is almost double the original size.

I think this should be addressed in nngraph though.

soumith commented 8 years ago

@fmassa I dont think that holds for type. What you are saying actually might be affecting clearState though. To check what you said, I added this assertion to tests, and it passes. https://github.com/torch/nngraph/commit/08d0b5db1b56bbb52dfb81a233d288d3654c07d8

soumith commented 8 years ago

@mbchang reproduced your issue. I am looking into it.

mbchang commented 8 years ago

Thanks @soumith, that would be a great help!

fmassa commented 8 years ago

@soumith tests passes because forward is runned after model conversion. Without the forward it will fail I think. but my workaround didn't work for networks containing cudnn modules which were converted to nn, I had to recreate the module and copy the parameters by hand

soumith commented 8 years ago

oohh i see, ok makes sense. looking into it.

fmassa commented 8 years ago

I just realized that the line that my comment was pointing is off. I was referring to this line, which I'll copy here to avoid other ambiguities:

child.data.input[mapindex] = x
soumith commented 8 years ago

@mbchang @fmassa fixed it via #126 . Reinstall nngraph and it should be fixed.