tudelft / ssl_e2vid

MIT License
68 stars 11 forks source link

Flow Evaluation on MVSEC #1

Open adarshkosta opened 3 years ago

adarshkosta commented 3 years ago

I was trying to evaluate the provided EV-FlowNet and FireFlowNet models for estimating optical flow on the MVSEC dataset and had some issues replicating the AEE results.

The provided models seem to perform well, visually on the ECD dataset, but there is no error metric to quantify the predictions due to lack of ground truth data.

The same models seem to perform not as well on MVSEC as on ECD and I could not find any codes which compute AEE for evaluation on MVSEC. According to Section 4.1 in the paper, the flow predictions were generated at each grayscale frame timestamp and scaled for the time duration between consecutive grayscale frames during evaluation. As per the binning and event window scheme, configs/train_flow.yaml uses window: 5000and num_bins: 5. How are these changed when evaluating on MVSEC between grayscale frames? I understand that the window length can be changed to span over the duration between consectutive gray frames but the num_bins cannot be changed as it represents the number of input channels. Correct me if I am missing something.

It would be great if you could provide a bit more explanation on this and better yet update the repository with the evaluation code.

Thanks in advance!

fedepare commented 3 years ago

Hi Adarsh,

Firstly, sorry for the late reply. I was taking a week off and couldn't access my laptop.

Regarding the evaluation on the MVSEC dataset:

1) We generate ground truth optical flow data at each grayscale frame timestamp using the code provided by Zhu et al., RSS'18. This code interpolates the ground truth data provided by MVSEC (which comes at a different frequency, as in Zhu et al., RAL'18), synchronizes it with the grayscale frames, and makes sure that it represents per-pixel displacement between successive frames. 2) Although we trained our networks using a fixed number of input events, for this evaluation, we generated an optical flow prediction at each grayscale frame timestamp. This means that, in this case, we varied the number input events during inference to synchronize the optical flow predictions with the ground-truth data. 3) Regarding the number of bins of the spatiotemporal voxel grid that we use as event representation, we didn't vary it for this evaluation. As you correctly pointed out, this is an architectural hyperparameter that cannot be modified at inference time.

As for the evaluation code, we will (hopefully) soon release a separate repository with the entire training/evaluation pipeline for event-based optical flow estimation with neural networks. So please stay tuned!

Again, sorry for my lateness. Let me know if I can be of any other help.

Cheers, Fede.