Clarification about TestTimeRefinement

Hello, I am reading this paper and wanted to clarify the fundamental for myself, hope you can help. For each video for which I would like to refine the monocular depth, I shall fine-tune the core network using the inferred depth (e.g., using CADepth) and output of COLMAP (i.e., SfM). If that is correct, can you please advise, how long would it take to fine-tune for a single video in addition to running CADepth's original inference and COLMAP on the frames?

For example what would be additional time required to refine depth for a video which is may be 30 second long (e.g., @~30fps = 900 total frames). Thanks?

serizba / SfM-TTR

Clarification about TestTimeRefinement #2