GRIP for 3D - Githubissues

poojitharamachandra commented 3 years ago

HI,

I enjoyed reading through your paper. I am working on trajectory prediction of objects in a confined space. Do you think GRIP can be applied in 3D space instead of 2D space ? Do you already have some work on this?

xincoder commented 3 years ago

Hi @poojitharamachandra , thank you for your interest. The GRIP works in 3D space. If you look at the input dimensions, it does not have a specific design to only process 2D data. Thus, we can simply replace 2D space data with 3D space data before feeding them into the model. We did not try the model on a 3D space trajectory dataset, but we tried the model on the Human3.6 for human pose (3D coordinates) prediction and achieved better results than state-of-the-art solutions.

poojitharamachandra commented 3 years ago

thanks for the info. Could you please share your work on Human3.6? if possible

I have couple of questions: 1.what is the significance of max_hop? 2.could you explain in brief the convolution and einsum operations of graph convolution? especially what each dimension specify?

I am trying to apply your concept on the system of particle trajectories confined in a definite space with periodic boundary conditions. i.e, if the particle flies out from one side of an imaginary box, it returns on the other side of the box. Do you think this is feasible with your model?(maybe with some modifications)

xincoder commented 3 years ago

The "mat_hop" is passed into Class Graph() in layers/graph.py. It is used in GRIP/layers/graph.py [line 21], np.linalg.matrix_power(A, d). In Graph Theory, adjacency matrix powers (A^hop_step) gives the number of walks of length hop_step from node i to node j. Here, we use it to figure out whether two nodes are connected within "hop_step" steps.
In forward() function in layer/graph_operation_layer.py [line16-34], adjacency matrix A has a size of (n, k, v, w), where "n" is the batch_size, "k" is the number of matrice (we have multiple matrice, connection to the node itself, connection to other nodes, etc.), "v" and "w" corresponding nodes. Thus, in this function, we reshape the input feature "x" to match the dimensions with A, and then do matrix multiplication. The einsum allows us to do matrix multiplication along specific dimensions. (Please refer to the official document for more details.)
Yes. GRIP should be able to handle the case you described. You can use the same network architecture, but the dataloader should be modified to handle your own data. Meanwhile, please refer to the "TABLE III: Changes in performance (in term of WSADE) while adjusting the model" in our GRIP++ paper (https://arxiv.org/pdf/1907.07792.pdf).

In addition, I reported the results on the Human3.6M in this thesis (https://core.ac.uk/download/pdf/304204351.pdf) since Page 61.

poojitharamachandra commented 3 years ago

Thanks for the detailed information :)

I see that you are using 6 history frames to predict for 6 future frames. But the neighbor matrix does not contain any time dimension. could you please let me know what is the reason for this.(I understand that you are using only the neighbor matrix of the start frame)

Also I don't see usage of Graph.get_adjacency anywhere (may be I have mistaken?)

xincoder commented 3 years ago

@poojitharamachandra , we only take objects that appear at the last observed time step into account (data_process.py line48) because objects that do not appear at this time step have a high probability do not appear in the future. In addition, objects may have occlusion (disappear) on some frames and then come back during our observation. If we generate an adjacency matrix for every single frame, the relationship between objects may be changing quickly (not consistent, it has a negative impact on the feature extraction: missing some features on/before those occlusion frames, because features are multiplied with the adjacency matrix). Thus, we just choose the last observed frame to calculate the adjacency matrix.

The Graph.get_adjacency is called in xin_feeder_baidu.py (line 83).

poojitharamachandra commented 3 years ago

@xincoder thank you very much How do You ensure that the predicted vehicle trajectories do not overlap? (I have this problem in my usecase)

xincoder commented 3 years ago

@xincoder thank you very much How do You ensure that the predicted vehicle trajectories do not overlap? (I have this problem in my usecase)

@poojitharamachandra It is a very good question. In our work, we do not add any specific design to avoid predicting overlapped trajectories. In real life, should we force the model not to predict overlapped trajectories? (If there are predicted overlapped trajectories, does it mean two objects may collide in the future?)

poojitharamachandra commented 3 years ago

I am trying to use your paper to as inspiration for a different (physics) problem . in my use case it is important that the molecules don't collide with each other. (also I am predicting the positions, not velocity)

xincoder commented 3 years ago

@poojitharamachandra Thank you for adapting our work in solving other problems. Back to your question, we do not use any specific design to avoid colliding. There are two simple ways that you may want to try: (1) during training, you can add a big loss if there is any collision, (2) adding some post-processing to modify the prediction if there is any predicted collision. (Hope these two simple ideas inspire you to come up with more efficient and effective schemes.;-))

poojitharamachandra commented 3 years ago

thanks!

poojitharamachandra commented 3 years ago

Hi,

Could you please tell me how are u augmenting the data?
Is the representation learned by graph convolution independent of the absolute co-ordinates (x,y) gives as input?
You mention that the graph convolution model learns 'good representation'. Is there any analysis done on what kind of representation it learns? Thanks

xincoder commented 3 years ago

Hi @poojitharamachandra , thank you for your interesting in our work.

Please refer to the GRIP/xin_feeder_baidu.py (function getitem, line59) for the details of our data augmentation. Durining training, we mainly do randomly rotation augmentation (line65-81).
What do you mean by saying "independent of the absolute"? As we descripted in our paper GRIP++ (Section IV. A. 1 Input representation), "we calculate velocities before feeding the data into our model".
There was no special analysis (neither tried to explain what representation the model learned) about the representations. However, the better final predictions prove that the model learned a good representation.

Hope this answers your questions. Thanks.

poojitharamachandra commented 3 years ago

thanks. could you please point to the code where the velocities are calculated. (I am guessing the augmentation is done before calculating the velocities)

xincoder commented 3 years ago

@poojitharamachandra The velocities are calculated in main.py [line 99]. Thanks.

xincoder / GRIP

GRIP for 3D #13