The provided checkpoint is inconsistent with the config of the code

jayceeShi commented 4 months ago

Hello, I am using the checkpoint(gat-point-rel-attr-epoch-50.pth.tar) provided in README and geotransformer-3dmatch.pth.tar from https://github.com/qinzheng93/GeoTransformer/releases/tag/1.0.0 to infer the model, but it seems that the dimensions of the model and code are inconsistent. The command I ran is

python inference/sgaligner/inference_align_reg.py --config ../configs/scan3r/scan3r_ground_truth.yaml --snapshot ../checkpoint/gat-point-rel-attr-epoch-50.pth.tar --reg_snapshot ../checkpoint/geotransformer-3dmatch.pth.tar

And the error message is

RuntimeError: Error(s) in loading state_dict for MultiModalEncoder: size mismatch for meta_embedding_rel.weight: copying a param with shape torch.Size([100, 41]) from checkpoint, the shape in current model is torch.Size([100, 9]).

I read the code and it seems that the reg_im defined in scan3r_round_truth.yaml is inconsistent (set to 9 in the file, while the parameter in default.py is 41). Therefore, I tried to directly modify this parameter to 41, but later encountered an error in aligner/sg_aligner.py. The error is

RuntimeError: mat1 and mat2 shapes cannot be multiplied (150x9 and 41x100)

And the corresponding code is line 119, in forward emb = self.meta_embedding_rel(tot_bow_vec_object_edge_feats)

Therefore, I am wondering if the provided checkpoint does not match the open-sourced config file? It would be a great help if you could help to update the model. Thank you very much.

sayands commented 3 months ago

Hi, thanks for your interest in our work.

For simplicity, we only release one model where the 3RScan GT relationships (41 categories) are mapped to predicted relationships by SceneGraphFusion (9 categories). This means when preprocessing, you have to ignore the relationships which are not in scannet8_relationships.txt. The performance would be a bit different from what we reported in the paper, but we wanted to have coherence in the released model for GT/predicted data.

However, if you absolutely need the model trained with GT relationships, we can release that as well (~1.5 weeks time). Hope that helps!

glennliu commented 3 months ago

Hi Sayan

Is the relationship label necessary for evaluation your work? If we want to evaluate your work on scene graphs reconstructed by a vissual SLAM system, it does not provide any relationship label. Do you think we can set all the relationship feature to zero feature?

Thanks

jayceeShi commented 3 months ago

Thank you very much for your prompt reply. If it's convenient, I still hope that you could open source the corresponding checkpoint. Thank you again for your kindness！

sayands commented 3 months ago

Hi,

Yes, you can use SGAligner without the relationship labels (ref: Table 1 of the paper). You don’t even need to set relationship features to zero. Just change config modules to [pct, gat] and load the weights for only those modules from the provided checkpoint. I’m assuming you also don’t have attributes since your output is from SLAM.
I’ll upload the GT checkpoint in ~1.5-2 weeks.

glennliu commented 3 months ago

Hi,

In the preprocess.py, I find you processed the relationship and store it in data_dict['edges_cat']. If I want to run without relationship, do I need to adjust this part?

Thanks

sayands commented 3 months ago

No, if you don’t have relationship labels, we don’t use this part. But for the structure encoder, you still need the edges (defined by source and target node) from the scene graphs stored in edges.

glennliu commented 3 months ago

Hi,

Can you provide your train and val set on 3RScan? I assume your split set is different from the original 3RScan split. It only has less 400 scenes for train, while you trained with more than 1000 scenes.

Thanks

glennliu commented 3 months ago

Hi

This is the scene graph pairs I processed during data generation. I use the train and val split provided by 3RScan. Is it consistent with your training set?

Thanks

glennliu commented 3 months ago

Hi

We have tried two methods.

We evaluate your checkpoint as you said. The config is changed to [pct, gat]. But in sgaligner.py, there is a module setting,
```
    self.inner_view_num = len(self.modules) # Point Net + Structure Encoder + Meta Encoder
    self.fusion = MultiModalFusion(modal_num=self.inner_view_num, with_weight=1)
```
Remove the optional module cause the inconsistent model layers. An error occurred as below,

Due to the fusion process, the model can not run without the relationship and attributes. Can you provide a checkpoint that has been evaluated with only the two necessary module?

We also tried training your model. It looks like the loss is coverging. But at epoch 40, it suddenly generate nan loss. Do you know the reason?

In all the trainning and evaluation, we have kept all of your original parameters.

sayands commented 3 months ago

Hi,

3RScan has ~300 reference scans, which represent different environments, but each environment is scanned multiple times as well (the original dataset was intended for relocalisation). We train on the entire set (rescans + reference scans), which equals ~1000 scans.
Sorry, I didn’t consider the fusion. I can provide you with a separate checkpoint in ~2 weeks, however, meanwhile, try this: when you load the model and do the fusion, just use the first 2 values, which are for [pct, gat] modules respectively.
Nice that the loss converges but I’m not sure why would it become suddenly nan. This didn’t happen with the public datasets like 3RScan and Scannet. I’d suggest to train each module at a time and figure out which one is causing this issue.

glennliu commented 3 months ago

Hi,

3RScan has ~300 reference scans, which represent different environments, but each environment is scanned multiple times as well (the original dataset was intended for relocalisation). We train on the entire set (rescans + reference scans), which equals ~1000 scans.

Sorry, I didn’t consider the fusion. I can provide you with a separate checkpoint in ~2 weeks, however, meanwhile, try this: when you load the model and do the fusion, just use the first 2 values, which are for [pct, gat] modules respectively.

Nice that the loss converges but I’m not sure why would it become suddenly nan. This didn’t happen with the public datasets like 3RScan and Scannet. I’d suggest to train each module at a time and figure out which one is causing this issue.

I understand the split of 3RScan. I mean do you follow their split files? So, you used the ~300 scenes to train. And all of the reference scans and rescans of those ~300 scenes are considered in training. Is that right?
We have tried loading the [pct, gat] modules only. The fusion step still fail.

sayands commented 3 months ago

Hi,

Yes, I use all scans (~300 reference scans + rest rescans), for training.
That sounds a bit weird. Did you change the fusion module to take just 2 values? Just changing the config params wouldn’t be good for this, you have to change the implementation for the fusion part.

sayands / sgaligner

The provided checkpoint is inconsistent with the config of the code #9