Closed ugtony closed 7 years ago
Hi @ugtony,
yes, this is also possible in tfdeploy, although this feature is quite hidden as there's no such thing as a session object that can evaluate tensors simultaneously.
Per eval
invocation, the intermediate results of all depending tensors and ops are cached.
The actual signature of eval
is:
eval(feed_dict=None, _uuid=None)
_uuid
is used for caching, initially set when None
and passed to all depending eval calls. So all you have to do is:
from uuid import uuid4
...
uuid = uuid4()
result2 = y2.eval({x1: batch}, uuid)
result3 = y3.eval({x1: batch}, uuid)
I only tested this feature, but never had to use it productively, so feedback is appreciated ;)
Thanks! Glad to know there is a caching feature.
In my example, is it correct to simply use the function add(...) twice to create the tfdeploy model?
# setup tfdeploy (only when creating models)
...
# build your graph
...
y2 = tf.nn.softmax(tf.matmul(x1, W2) + b2, name="output2")
y3 = tf.nn.softmax(tf.matmul(x1, W3) + b3, name="output3")
# use add twice to create tfdeploy model
model = td.Model()
model.add(y2, sess) #1st add
model.add(y3, sess) #2nd add
model.save("model.pkl")
I've tested the caching feature and it works.
Howerver, it is much slower than the tensorflow cpu version. In my experiment, a fully convolutional neural network is used with an image pyramid.
for scale in scales:
layer = image_pyramid[scale] #layer size changes with the scale
uuid = uuid4()
o1 = out1.eval({input: layer}, uuid)
o2 = out2.eval({input: layer}, uuid)
print(layer.shape())
print(o1.shape())
I guess the speed drop might be caused by 1) the frequent change of input size, or 2) the frequent use of caching feature.
In my example, is it correct to simply use the function add(...) twice to create the tfdeploy model?
Yep, that's correct. Overlaps between the two graphs are found automatically via tensor instance caching so there's no need to worry about redundant computations.
If the two tensors you add to the model are somehow related (e.g. if y3
requires/depends on y2
), it's also possible to only add the most general tensor (e.g. y3
).
I've tested the caching feature and it works. Howerver, it is much slower than the tensorflow cpu version.
Do you mean the caching or the convolution itself?
Please ignore my guesses.
I measured the running time. The caching feature works, the second eval time is small compared to the first eval. (0.1s and 2.9s in my case)
=== But the operations using tfdeploy is 10 times slower than tensorflow. tfdeploy took 3 seconds on 150x150 images while tensorflow took only 0.3 seconds. I've already turned on the scipy optimization feature while converting the model. The fully convolutional neural network I used is the P-Net described in "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks" without "facial landmark localization" component.
Yep.
tensorflow has the advantage to be fully backed by a customized & optimized C++ backend that performs all heavy operations.
tfdeploy, on the other hand, essentially relies on bare numpy operations which sometimes have to be combined to exactly resemble the behavior of tensorflow. Conv and pooling ops are good examples. The drawback is that combinations are implemented and executed in Python. And sometimes even numpy functions aren't completely backed by equivalent C++ functions, but they use different python calls to achieve the desired functionality.
Concerning the tfdeploy conv and pooling ops: I have one or two ideas that might improve the performance. And maybe it's worth looking into scipy convolve, but this will also require to do some preprocessing, e.g., to ensure the same padding rules.
In the example, tfdeploy get the result of y1=W1x1+b1 by
If I have a graph with two output y2=W2(W1x+b1)+b2, and y3=W3(W1x+b1)+b3, in tensorflow I can use
to get y2 and y3 simutaneously while avoiding redundant computation(of y1=W1x1+b1).
is it possible to do the same thing with tfdeploy? or I have to use two commands like below