heyoeyo / muggled_dpt

Muggled DPT: Depth estimation without the magic
Apache License 2.0
60 stars 4 forks source link

new model #6

Open IneedhelpRr opened 3 weeks ago

IneedhelpRr commented 3 weeks ago

Hello, I would like to know, can I train a model of a scene with my own dataset, but I don't know how

heyoeyo commented 3 weeks ago

I haven't tried any training myself, so there isn't any training code in this repo unfortunately. The only training code that I'm aware of is for the depth-anything (v1 and v2) metric-depth models, as well as the ZoeDepth models.

I haven't seen any examples of training the relative depth models, though there are requests for this, which might be worth keeping an eye on.

If you are wanting to do training of relative depth models, then the v2 (metric) depth-anything training script is probably the best place to start since it's the easiest to understand. For example, the main training loop code is this small block, everything else is mostly for setup/logging. To make this work for a custom (relative) depth model, you'd need to change a few important things:

  1. Setting up your dataset to load into the script. The easiest way would probably be to mimic the existing Hypersim format. You can see how this loader works by looking at the Hypersim class code. It seems to expect a .txt file that lists the path to each training image and corresponding depth image line-by-line, with spaces separating the two paths.
  2. Loading the model you want to train (instead of the metric depth-anything v2 model)
  3. Updating the loss function to account for relative depth. This is probably the most difficult part. All of the relative depth models seem to be based on the original MiDaS paper. The loss function is explained on pages 5 & 6, under the section Training on Diverse Data.

So those are probably the main steps/changes needed to make the training script work for relative depth. The loss function math is a bit scary looking and there are lots of little tricks in their implementation, which may explain why there isn't a lot of code available already, but if you get that part working then it should be possible to train with custom data.

IneedhelpRr commented 3 weeks ago

我自己没有尝试过任何训练,所以不幸的是,这个 repo 中没有任何训练代码。我知道的唯一训练代码是 depth-anything(v1v2)度量深度模型,以及 ZoeDepth 模型。

我还没有看到任何训练_相对_深度模型的例子,尽管有这样的要求,这可能值得关注。

如果您想对相对深度模型进行训练,那么 v2 (公制) depth-anything 训练脚本可能是最好的起点,因为它最容易理解。例如,主要的训练循环代码就是这个小块,其他一切都主要用于设置/记录。要使它适用于自定义(相对)深度模型,您需要更改一些重要的事情:

  1. 设置数据集以加载到脚本中。最简单的方法可能是模拟现有格式。您可以通过查看 Hypersim 类代码来了解此加载程序的工作原理。它似乎需要一个.txt文件,该文件逐行列出每个训练图像的路径和相应的深度图像,并用空格分隔两条路径。Hypersim
  2. 加载要训练的模型(而不是度量 depth-anything v2 模型)
  3. 更新损失函数以考虑相对深度。这可能是最困难的部分。所有的相对深度模型似乎都是基于原始的 MiDaS 论文。损失函数在第 5 页和第 6 页的“多样化数据训练”部分下进行了解释。

所以这些可能是使训练脚本相对深度工作所需的主要步骤/更改。损失函数数学看起来有点吓人,而且它们的实现中有很多小技巧,这可能解释了为什么还没有很多可用的代码,但如果你让这部分工作,那么应该可以使用自定义数据进行训练。

Thanks for your reply, that's exactly what I wanted to ask, but I don't really understand that (loading the model to be trained (not the depth-anything v2 model)), I don't have a model to load it.

IneedhelpRr commented 3 weeks ago

我自己没有尝试过任何训练,所以不幸的是,这个 repo 中没有任何训练代码。我知道的唯一训练代码是 depth-anything(v1v2)度量深度模型,以及 ZoeDepth 模型。

我还没有看到任何训练_相对_深度模型的例子,尽管有这样的要求,这可能值得关注。

如果您想对相对深度模型进行训练,那么 v2 (公制) depth-anything 训练脚本可能是最好的起点,因为它最容易理解。例如,主要的训练循环代码就是这个小块,其他一切都主要用于设置/记录。要使它适用于自定义(相对)深度模型,您需要更改一些重要的事情:

  1. 设置数据集以加载到脚本中。最简单的方法可能是模拟现有格式。您可以通过查看 Hypersim 类代码来了解此加载程序的工作原理。它似乎需要一个.txt文件,该文件逐行列出每个训练图像的路径和相应的深度图像,并用空格分隔两条路径。Hypersim
  2. 加载要训练的模型(而不是度量 depth-anything v2 模型)
  3. 更新损失函数以考虑相对深度。这可能是最困难的部分。所有的相对深度模型似乎都是基于原始的 MiDaS 论文。损失函数在第 5 页和第 6 页的“多样化数据训练”部分下进行了解释。

所以这些可能是使训练脚本相对深度工作所需的主要步骤/更改。损失函数数学看起来有点吓人,而且它们的实现中有很多小技巧,这可能解释了为什么还没有很多可用的代码,但如果你让这部分工作,那么应该可以使用自定义数据进行训练。

class DepthAnythingV2(nn.Module): def init( self, encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024], use_bn=False, use_clstoken=False, max_depth=20.0 ): super(DepthAnythingV2, self).init()

    self.intermediate_layer_idx = {
        'vits': [2, 5, 8, 11],
        'vitb': [2, 5, 8, 11], 
        'vitl': [4, 11, 17, 23], 
        'vitg': [9, 19, 29, 39]
    }

    self.max_depth = max_depth

    self.encoder = encoder
    self.pretrained = DINOv2(model_name=encoder)

    self.depth_head = DPTHead(self.pretrained.embed_dim, features, use_bn, out_channels=out_channels, use_clstoken=use_clstoken)

Is it the model here, do I need to redefine a model n ...

heyoeyo commented 3 weeks ago

Is it the model here, do I need to redefine a model n ...

Yes you could use the existing depth-anything v2 model as a starting point if you wanted to (or depth-anything v1 or one of the MiDaS models etc.), that's probably the best approach if you don't have a specific model that you'd prefer.

If you're wanting to train a relative depth model, you'd need to use the relative depth version of depth-anything which is in the depth_anything_v2/dpt.py script as opposed to the metric depth model (which is in metric_depth/depth_anything_v2/dpt.py and looks similar, but isn't identical).

You can load it the same way as they describe in their Use our models section (except you don't want to include the .eval() part, which disables training features). Their example code also includes the step for loading existing weights (e.g. model.load_state_dict(...)), which would be a good idea if you're just wanting to 'fine tune' an existing model with your own data. If you don't start with the existing weights you'd be training the model from scratch, which is possible but requires a lot of data and would take a long time.