kenziyuliu / MS-G3D

[CVPR 2020 Oral] PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition"
https://arxiv.org/abs/2003.14111
MIT License
430 stars 96 forks source link

Recognizing activities using your library #41

Closed fspegni closed 3 years ago

fspegni commented 3 years ago

Hi,

I'm trying to use your code (or some of its functions/classes) as a "blackbox" in order to classify a dataset extracted from an Intel Real Sense camera (video + skeleton information + extra data obtained post-processing the info using Unity and Matlab). At the moment it's not clear how should I "encode" my dataset and pass it to your scripts, in order to get it labeled by your code (using the pre-trained models). I'm "fluent" with Python, but it's the first time that I use PyTorch. I've read your paper and README, but it seems to me this information are not given (perhaps they are obvious to your readers, but I'm not very familiar with AI and action recogniztion).

BTW, I'm aware of issue #18 that seems very close to what I want to achieve, and I don't understand why you said it was not possible. Perhaps I'm missing something obvious for you, but not for me. Can you please give me some hint or elaborate why it's not possible to do that using your code? Thanks in advance for any information you can provide.

kenziyuliu commented 3 years ago

Hi @fspegni,

Thanks a lot for your interest!

Re: using pre-trained models for labeling your custom data

This is certainly possible. A few pointers (using NTU dataset as an example):

Note also that the pre-trained models are trained on a few research datasets (NTU / Kinetics) which have predefined action classes. If your custom data have actions that aren't covered by these datasets, the generated labels probably won't be very useful.

Re: Issue #18

I believe the OP was referring to "action detection" which is a different task compared to "action classification". This work focuses on classifying the action when given an input skeleton sequence (2D/3D human skeleton key points across time). Think the differences between "image classification" and "object detection within an image".

Other general comments

This repo is intended to be a "research repo" in the sense that it's written for running experiments on a few pre-defined research datasets (NTU and Kinetics) and report metrics on them, and it does not directly support your use case. I think the whole thing will be easier to understand with some more familiarity with PyTorch as well as the overall data pipeline (i.e. how does the above skeleton data get converted into tensors, their preprocessing steps, etc).

Hope this helps!

kenziyuliu commented 3 years ago

Closing due to inactivity; feel free to re-open if the issue is unresolved.