wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API
MIT License
7.01k stars 1.77k forks source link

tsm model different from original #1572

Open knowlessthanenough opened 2 months ago

knowlessthanenough commented 2 months ago

I am recently running tsm and I find it layer seem to be different . in the tsm_r50.py create_engine function after the fc1 the num_outputs = OUTPUT_SIZE which is 400 the class number until here it is normal but than it do reshape reshape it to [num_segments, output_size] I don't understand why and even how a shape [400] can be resize to [8,400]. after that it add_reduce axes=1 keep_dims = false so it turn into [8]? than it do softmax on axis 1? it is already 1d how come there is axis 1. and also reduce to 8 (segment ) than how to know what is the class of video?

I know it is a old repo but if anyone know the concept I will be very thank you.

wang-xinyu commented 2 months ago

@irvingzhang0512 pls help

knowlessthanenough commented 2 months ago

i turn it in to explicit input and i find that what it is doing is same as orginal, the code use different method but infact what it is doing is : after pooling2 it shape is [4, 8, 2048, 1, 1] than i reshape it to [4*8,2048] than i do matrix multiple with fc1 weight [class_num, 2048] finally add with [1, 1, class_num] (fc1 bias). you will get [batch_size, segment, class_num] than do reduce in axis2 (cause i have batch dimension) at first i was confuse why reduce in class_num channel but when i print it i find it reduce in segment channel output shape [batch_size , class_num]. i think what it is doing is combine multi seg information (still trying to understand orginal paper and why channel 2 not 1 is segment channel) than do softmax to output prob.

the different i think is because it is working on batch and the one i look is online demo so there is no 8 segment. it do it one by one. (as for reduce layer still checking on tensorrt api)

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.