tedhuang96 / gst

[RA-L + ICRA22] Learning Sparse Interaction Graphs of Partially Detected Pedestrians for Trajectory Prediction
MIT License
38 stars 3 forks source link

Qesqution about batch_size #1

Closed lsc12318 closed 1 year ago

lsc12318 commented 1 year ago

Hey, how nice work! But there are something confusing me. In the code I see that the batch_size is set 1,and during training model computation the input x only the zero dim are used, eg. x_embedding = self.node_embedding(x)[0]. and of the batch_size set another value like 256, it seems that the st_model can't deal with the rest of batches. Could you tell me more details or your consideration. Thanks!!!

tedhuang96 commented 1 year ago

Thank you for your question! You are right that we always set the batch_size as 1 for data loaders here. https://github.com/tedhuang96/gst/blob/ac300d34e17fa2d6639c1df329ac1e8f80bccaec/src/mgnn/utils.py#L84-L88

This is because each sample is allowed to have different number of agents. With different dimensions on the number of agents, there is no easy way to stack multiple samples to form a batch. Imagine two trajectory samples with dimensions (T, N1, 2) and (T, N2, 2), where the first has N1 pedestrians and the second has N2 pedestrians and time steps are all T. We cannot directly create a batch of (2, T, N, 2) by stacking the two tensors together, because N1 is not equal to N2. That is why in st_model we always assume we process a single batch.

To still allow mini batch training, we compute back-propagation for each single batch and accumulate the gradient until the batch count reaches the batch size we set (in your example 256), then we update the network https://github.com/tedhuang96/gst/blob/ac300d34e17fa2d6639c1df329ac1e8f80bccaec/scripts/train.py#L99-L104

This setup is essentially inherited from previous works Social STGCNN here.

lsc12318 commented 1 year ago

Oh, I got it! Sorry for not seeing your reply in time. But I have another question.

Regarding the metrics mentioned in your paper (AOE/FOE), I think they are defined in a way that is consistent with the ADE/FDE mentioned in other previous papers (eg. Trajectron++). But the data you mention in your paper, e.g. TABLE I "Trajectron++ (P) 1.40±0.56 0.27±0.17 0.50±0.23 0.50±0.23 0.33±0.16 0.60±0.27" that I seem not to be able to find in the original Trajectron++ paper. What I would like to ask you is whether you have reproduced this algorithm in your own equipment or whether my understanding of the AOE/FOE is off.

ps: Average Displacement Error (ADE): Mean L2 distance between the ground truth and predicted trajectories. Final Displacement Error (FDE): L2 distance between the predicted final position and the ground truth final position at the prediction horizon T

tedhuang96 commented 1 year ago

Thank you for your question. We found that our model does not perform well using the leave-one-out setup for ETH-UCY, because we use fixed sparse hyperparameter in our algorithm. The performance of a sparsity configuration is dependent on scene properties such as average number of partially detected pedestrians, and the fixed hyperparameter would constrain generalization across the scenes. Thus, in our benchmark, we split each scene into train/val, and train model on each train/val set independently. We re-trained the existing algorithms by ourselves on the same setup and report the results. Using learnable sparsity to handle scenes with varying crowd densities would be a future direction to address this issue.

lsc12318 commented 1 year ago

GOTIT!!! Thanks!