princeton-vl / RAFT

BSD 3-Clause "New" or "Revised" License
3.19k stars 623 forks source link

Confused about pre-trained weights #124

Open NicolasHug opened 2 years ago

NicolasHug commented 2 years ago

Hi, and thanks a lot for RAFT.

I'm sorry for opening yet another question about how the weights were trained, but I'm a little confused and it seems like there are a few conflicting definitions.

From https://github.com/princeton-vl/RAFT/issues/10#issuecomment-634092354 it is said that raft-sintel.pth does not make use of Kitti data:

The model sintel.pth is the chairs+things.pth model finetuned on Sintel (C+T+S).

But https://github.com/princeton-vl/RAFT/issues/67#issuecomment-751524440 suggests that raft-sintel.pth does make use of Kitti data:

raft-sintel: trained on FlyingChairs + FlyingThings + Sintel + KITTI (this is the model which corresponds to our submission on the Sintel leaderboard) raft-kitti: raft-sintel finetuned on only KITTI

Also, I don't understand what "raft-sintel finedtuned on only Kitti" means. Does that mean raft-kitti is just C + T + K?

https://github.com/princeton-vl/RAFT/issues/37#issuecomment-692763367 suggests that both sintel.pth and kitti.pth make use of both Sintel and Kitti data:

raft-kitti.pth and raft-sintel.pth follow the C+T+S+K+H training in the paper

The paper says that C+T+S+K+H combines KITTI, HD1K, and Sintel data when finetuning on Sintel - but then it's not clear how raft-kitti.pth differs from raft-sintel.pth if they both use C+T+S+K+H.

Finally, when evaluating kitti.pth on sintel, I'm getting pretty bad EPE, significantly worse than the 1.43 - 2.71 that can be achieved by C + T:

(raft) ➜  RAFT git:(master) ✗ python evaluate.py --model ../downloads/models/raft-kitti.pth --dataset sintel
Validation (clean) EPE: 4.545920, 1px: 0.799649, 3px: 0.892883, 5px: 0.916896
Validation (final) EPE: 6.158094, 1px: 0.745571, 3px: 0.849711, 5px: 0.879833

From my current understanding (but I might be very wrong):

Again, sorry for another post like this. I would greatly appreciate any clarification, as I'm a bit confused ATM. Thanks a lot!

zachteed commented 2 years ago

Hi, sorry for the late response.

All comments you referenced were correct at the time when they were written. In the initial release of our paper / code (March 2020), the sintel.pth model was only finetuned on Sintel (not KITTI or HD1k), it also had no upsampling, and got 3.39 final EPE (see v1 of our paper).

In August 2020, we published a new version of our method and code. We added upsampling and also added KITTI + HD1K data for finetuning (in some experiments - we also had results in the paper where only Sintel data was used for finetuning). The current sintel.pth model was trained with these settings and scores 2.86 final EPE on Sintel.

Regarding the kitti.pth performing poorly on Sintel, I think this is expected (and I imagine this would be the case for any flow network trained on KITTI). KITTI has a very restricted set of motion patterns.