neuroailab / tfutils

Utilities for working with tensorflow
MIT License
25 stars 8 forks source link

better handling of "cfg_final" data and model equivalence checking #29

Closed yamins81 closed 7 years ago

yamins81 commented 7 years ago

The current mechanism for recording the actual final structure of the model is sub-optimal. We use the "cfg_final", which is whatever is returned by the user's model construction function.

The purpose of having a good "cfg_final" is to: (a) have something to put in the database that allows searching on the model structure in a generic way (b) allows the ability to compare models created at one time (e.g during one training session) to those created at other times (e.g. for validation or doing a later training session) to ensure that they are the same.

The current cfg_final thing doesn't really do this very well.

Along these lines, we need to improve the check_model_equivalence function, which actually implements the check of when two models are actually the same. The difficulty is even when the two models are basically the same, there will be small differences (e.g. the names of the nodes in validation models will have "validation/[validation_name]/" prepended) which means that a simple

     model1 == model2

check is not appropriate.

chengxuz commented 7 years ago

One solution is removing the "cfg_final" and creating a json description for the network structure from the output node returned or tensorflow graph generated. Or use some existed code (maybe part of "tensorboard").

yamins81 commented 7 years ago

I think we'd basically want to use the graphdef and extract the protocol buffer definition of the graph -- minus the weights. that's basically json i think.

yamins81 commented 7 years ago

Another option would be to, instead of doing the kind of Saver-based storage, with the restore method, that we're currently doing, to store a complete serialized frozen complete version of the model (e.g. using freeze_graph) or similar) and load from that. (We'd have to replace the feed-dict input nodes to the graph, each time a new data source is going to be used, somehow.) We'd then not ever have to re-construct the model; the thing would literally be the old model. So need for checking. Also no need for revivification as in issue #28. Not sure this is better. Just maybe possible.