Training on FlyingThings3D

Hello, thanks for your great work and for providing the source code! I am currently trying to reproduce the training of your network. However, some details are not completely clear from your paper and my training code currently achieves only significantly less accurate results. Could you clarify some points regarding your training and object if some of my assumptions differ from yours?

Datasets

I use the FlyingThings3D dataset: Train split, left+right, into_future+into_past, clean+final --> 161208 training samples
Augmentation: I used the code in augmentation.py with crop_size [320,720]
Following the code in evaluation.py, I scaled both depths by 0.2 and called normalize_image on both images

Training Schedule
Learning rate starts at .0001 and decays linearly; using torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.0001, total_steps=200100, pct_start=0.0, cycle_momentum=False, anneal_strategy='linear')
200K iterations with batch size 4

Model
In order to reproduce the results at the bottom of Table 1, I used 12 iterations during training and the model raft3d.raft3d_bilaplacian

Your context encoder automatically initializes with the ResNet weights. I continued updating those - did you keep them fixed during training?

Loss

I changed these lines as follows:

        if train_mode:
            flow2d_rev = target.permute(0,2,3,1)[...,:2] - coords0
            flow2d_rev = se3_field.cvx_upsample(8 * flow2d_rev, mask)

            flow3d_rev = target.permute(0,2,3,1)[...,:2] - coords0
            depth_rev = target.permute(0,2,3,1)[...,2] - zinv
            flow3d_rev = torch.cat((flow3d_rev, depth_rev[...,None]), dim=-1)
            flow3d_rev = se3_field.cvx_upsample(8 * flow3d_rev, mask)

            Ts_up = se3_field.upsample_se3(Ts, mask)
            flow2d_est, flow3d_est, valid = pops.induced_flow(Ts_up, depth1, intrinsics)

            flow_est_list.append(flow3d_est)
            flow_rev_list.append(flow3d_rev)

Am I right to change the returned list to 3D flow estimation?
Did I calculate the inverse_depth revisions correctly (only 2D revisions where provided)? Is it correct to convex upsample them?
The loss function is then implemented as given in the paper; for the revisions I used the exact same loss summation (L1-norm, gamma weighting) and added the revision loss' sum to the epe sum with 0.2 weight.

Hardware setup
I used 4 A100 GPUS, so 1 batch per GPU (of course this setup might be different from yours)

Can you see any difference to your training? Do you have any additional hint?

Thanks a lot in advance!

princeton-vl / RAFT-3D

Training on FlyingThings3D #3

Datasets

Training Schedule

Model

Loss

Hardware setup