steb6 / ISBFSAR

Interactive Skeleton Based Few Shot Action Recognition
14 stars 2 forks source link

I noticed that your repo was updated #1

Open psiydown opened 2 years ago

psiydown commented 2 years ago

Hi @StefanoBerti ,I noticed that your repo was updated, I tested, but the following errors occurred:

  1. Run "modules/hpe/metrabs_trt/utils/extract_onnxs_from_metrabs.py"

    File "D:\python\scripts\AR-main3\modules\hpe\metrabs_trt\utils\extract_onnxs
    _from_metrabs.py", line 122, in metr_head  *
        pred2d, pred3d = model.crop_model.heatmap_heads.conv_final(tf.cast(my_pr
    ediction_inputs, tf.float16), training=False)
    ValueError: Found zero restored functions for caller function.
  2. Run "modules/hpe/metrabs_trt/utils/from_pytorch_to_onnx.py"

    File "D:\python\scripts\AR-main3\modules\hpe\metrabs_trt\utils\from_pytorch_to
    _onnx.py", line 5, in <module>
    torch_out = model(x)
    TypeError: 'collections.OrderedDict' object is not callable
  3. If I create an engine files from an old onnx files, and run "main.py", prompt that these files cannot be found:

    
    'modules/hpe/metrabs_trt/models/numpy/heads_weight.npy'
    'modules/hpe/metrabs_trt/models/numpy/heads_bias.npy'
    'modules/ar/trx/checkpoints/debug.pth'

How do I get or create these files, Can you send these files to me for testing?



4. What are the functions of the new TRX and LSTM?

5. Do you have a gtx1060 or gtx1080 graphics card? Can you test its compatibility with your program?
steb6 commented 2 years ago

Hi @psiydown, since I am working on this stuff right now I did some changes to the code and minor modification must be done in order to use the scripts, especially for extract_onnxs_from_metrabs.py and the fact that I extracted the weight and bias of the final convolution. If you need to run everything without modification I need something like 1-2 more weeks, otherwise you can try to understand what it is going on in the code (sometimes you just need to change the path of the file). I am using a RTX2080Ti

steb6 commented 2 years ago

Hi @psiydown, I finally made a clear guide to have everything working! I tested it and just by cloning the repo, running the scripts in modules/hpe/setup in order and then modules/hpe/hpe.py I managed to run inference.

psiydown commented 2 years ago

Hi @StefanoBerti ,Good job, it's a great job! I changed a new rtx3060 graphics card and tested it. It runs well and can reach 16 FPS! There are several questions:

  1. After using the new rtx3060 graphics card, running your old version of the program can reach 21 FPS. Although the recognition accuracy is not as high as the current version, the current version of the program is only 16 FPS. Do you know whether it is related to the following errors or other reasons?

  2. Warning when building Yolo engine:

    [!] Input tensor: input | Received unexpected dtype: int32.
    Note: Expected type: float32
  3. Prompt when running "main.py"(No impact on run):

    QWindowsContext: OleInitialize() failed:  "COM error 0xffffffff80010106 RPC_E_CHANGED_MODE (Unknown error 0x080010106)"

    and [W] trt-runner-N0-03/10/22-22:56:18 | Was activated but never deactivated. This could cause a memory leak!

  4. Running requires less memory than the original model, but the loading speed is still very slow every time you open it. Do you have any way to improve the loading speed?

  5. In addition, I am also interested in the action recognition module, How is the "trx.onnx" model generated or trained? Can it compare and score continuous action segments?

You did a very good job, Thank you very much!

steb6 commented 2 years ago

Thank you!

  1. I think that it happens because now we are doing test time augmentation with a factor of 5. I also had to reimplement the image transformer function which could be a bit slower w.r.t. the one of Tensorflow. The bottleneck now is memcpy between cpu and gpu, bit I have in plan to deploy a single engine which contains all the inference steps
  2. I didn't read it, but since those are RGB values it shouldn't be a problem
  3. I have never read it before
  4. The memory reduction is achieved by avoiding dependencies on PyTorch or TensorFlow, which imports a lot of stuff. The loading speed should not be slow, in my pc it loads all the engines in 8 seconds (hpe.py). It seems like you are doing something wrong here
  5. It is a Temporal Relational Cross Transformer trained on the skeleton extracted from the NTURGB120 dataset, but this is still a work in progress. There will be news in the next days
psiydown commented 2 years ago

Hi @StefanoBerti ,

  1. Can your engine use batch predict? The Metrobs model uses batch predict, the video speed is relatively fast. If your engine acceleration use batch predict, should be faster or even real-time! I tried to implement it with reference to Metroabs, but failed. Can you help me implement it? Thank you!

  2. Your TRX guide says "Download weight from Google drive", but I can't find the download link in the file.

  3. I hope you can send me "fix_223.pth", "yolo.engine", "bbone.engine", "image_transformation.engine" comparison test. Thank you very much!

steb6 commented 2 years ago

1) By using batch size greater than one you can surely increase a lot the throughput of the model, anyway my target is robotic application, so I need a batch size equal to one and I am interested in latency. I can't understand what you mean when you say that batch predict should make inference real-time 2) The guide was just a reminder for me since this is still a work in progress 3) You can try this checkpoint which is my last one and should perform very well

psiydown commented 2 years ago

Hi @StefanoBerti , You are right. Batch prediction can only improve throughput, not real-time. I repeatedly predicted the human posture of the same video file and found that the prediction accuracy of the creation engine was lower than that of the original metroabs model. I want to know whether the conversion onnx results in a reduction in accuracy or the creation of an engine results in a reduction in accuracy. Do you have code for me to test the prediction accuracy with the onnx model converted by metroabs? Thank you!

steb6 commented 2 years ago

@psiydown Really? That's interesting! Weeks ago I did some test between the model in TensorFlow, ONNX and TensorRT, but I didn't find any significative difference. I think that the difference in accuracies comes from the preprocessing and postprocessing steps, in particular I didn't add the implausible poses suppression because I don't have the file that is used during inference in MetrABS (https://github.com/isarandi/metrabs/issues/34), I don't know the real values used in is_within_fov, I don't know which is the test time augmentation factor used in MetrABS etc... Do you have big differences between the original model and the engine?

psiydown commented 2 years ago

Hi @StefanoBerti , Download this video and use the engine to predict. It is found that the left hand swings disorderly in frame 57, the left foot mutation in frame 791, and the whole video predict that the foot slides and often stands unstable. The original metroabs model predicts that there are no such errors and the effect is very good, but it loads slowly and occupies memory. The prediction speed is not as fast as your engine. If your engine accuracy can reach the accuracy of the original model, it will be perfect!

psiydown commented 2 years ago

I tested the original model of metroabs, reduced number of test-time augmentations num_aug=1, and turned off to inhibit bone length suppress_implausible_poses=False, but it didn't make the same mistakes as the engine. Postprocessing is not the main reason affecting the accuracy. If I can test code the predict of onnx model, it is possible to find out the reason.

steb6 commented 2 years ago

@psiydown It could be that you are making the same mistake that I did few time ago. The video has resolution 1920x1080, which means that it has an aspect ratio of 1920/1080=16:9. If you resize an image with that shape into the shape 640x480, you are changing the aspect ratio from 16:9 to 640/480=4:3. The correct way to preprocess such video is img = img[:, 240:-240, :]; img = cv2.resize(img, (640, 480)) Could this be your problem?

psiydown commented 2 years ago

@StefanoBerti No, I didn't make the same mistake. I pay great attention to precision and detail. I have scaled the video with the same proportion as the original video, and the excess border is filled with black.

steb6 commented 2 years ago

@psiydown maybe you are calling reconstruct_absolute? Anyway to test the ONNX model you just need to change the TensorRT runner with an ONNX runner such like ONNXRuntine or the one of Polygraphy

psiydown commented 2 years ago

@StefanoBerti According to your tips, I successfully use Onnxrtrunner of Polygraphy to predict with onnx model. However, the predict result is the same as that of the engine, which shows that the accuracy is not reduced by creating the engine. It may be that the accuracy has been reduced when converting the onnx model, or as you said, it is caused by preprocessing or post-processing. Can you test the video I sent to find out the reasons and solutions for the decline in accuracy? Thank you!

Because I need to get the absolute pose, I use reconstruct_ absolute

steb6 commented 2 years ago

@psiydown What do you mean with "testing the video"? I tried it and it works quite well imo. Anyway I don't use the reconstruct_absolute function.

I think that the fact that you use reconstruct_absolute is the problem, because previously I was using it, but now not and I didn't upload it anymore. You can try to comment the line pred3d = reconstruct_absolute(pred2d, pred3d, new_K, is_predicted_to_be_in_fov, weak_perspective=False) which should be somewhere near line 152 in hpe.py and see if it is the problem. The values in is_within_fov may not be correct.

psiydown commented 2 years ago

@StefanoBerti This is not the reason, I tested to delete reconstruct_absolute this line, but it has no impact on the accuracy. Deleting it will only fix the human body in the middle of the scene and the human body will not move. I recorded a test video to better illustrate the problem. At the beginning, the hand of the video character was fixed behind the back, but the predicted green character swung the hand behind the back twice. Please pay attention to the position pointed by the mouse at the beginning.

steb6 commented 2 years ago

@psiydown ok thank you now I got what you mean! Well that pose is very hard to estimate btw, I don't know how the original model could handle it. Since you have tested reconstruct_absolute and the bone length suppression, some things still to consider are:

psiydown commented 2 years ago

@StefanoBerti I read the source code of metabs and set the parameter antialias_ factor=1, Turn off antialiasing. Reference gamma coding increases the brightness of the picture, but changing these methods does not affect the accuracy.

I have no other way, I pay more attention to model accuracy and loading speed than predict speed. Do you have any way to use the original model, reduce the memory occupation and improve the loading speed? I hope you can give me specific tips. Thank you very much!

steb6 commented 2 years ago

@psiydown if these changes in accuracy are fundamentals for your implementation then yes, I think that using the original model works better for you. I have no clue what else is missing or is wrong here.

You can use the XLA optimization of TensorFlow to increase the number of FPS to 4.5, in my case it worked.

Let me know if you manage to discover how to improve the accuracy!

psiydown commented 2 years ago

@StefanoBerti If I find out how to improve the accuracy, I'll let you know at the first time. Try XLA can indeed improve the predict speed, but the loading speed of the original model is still very slow, Each loading takes up 11g of memory by waiting for a few minutes. I found that the metabs model extracted by your program is relatively small. The extracted original model does not need to be converted onnx, Can it be predict directly? How to realize it?

steb6 commented 2 years ago

@psiydown sorry but I didn't understand your question. There are a lot of preprocessing and postprocessing that I tried to replicate accurately to avoid to use TensorFlow, but if you want to use the original model you need to accept to use TensorFlow. Anyway, you can find how to reduce the GPU memory used by TensorFlow