microsoft / ContextualSP

Multiple paper open-source codes of the Microsoft Research Asia DKI group
MIT License
372 stars 62 forks source link

Questions about RUN model #10

Closed chuckhope closed 2 years ago

chuckhope commented 3 years ago

Dear Dr. Liu Qian,

I appreciate your work that combining cv's semantic segmentation with NLP is a fantastic idea. I have run the code and have some questions. I hope you can help me. Thank you very much. Below are questions:

  1. Using the similarity function to build the feature map then "ellipsis" and "coreference" 's pixel value will be close, how can semantic segmentation differ them while predicting.
  2. Related to the first question, since using similarity function to build the feature map then the replicas' pixel value in the context utterance will be closer, which means they are easy to be predicted as the same class resulting in replica operation in the result. It could owe to the nature of CNN nature to have the invariance in the image.
  3. Do you have some other tricks since I still cannot reproduce your result by retraining the "train_multi.sh" several time

Best Regards, Yong

SivilTaram commented 2 years ago

@chuckhope Thanks for your interest on our work! Sorry for the late response. Below are the response:

  1. This is a very good question! As for why similarities can work under this scenario, there is a related discussion in the paper:

    For coreference, the similarity function is suitable for identifying whether two spans refer to the same entity. For ellipsis, the similarity function is an effective indicator to find matching anchors, which indicate the possible insertion positions. As for how the model can determine coreference and ellipsis, I suspect that the similarity pattern may have a slight difference. BTW, I have also tried to concatenate the contextual representations of each word pair and feed them to the classification head, but it does not show improvement. I think it is worth to have a deep study if you're interested.

  2. I think the hierarchical structure of UNet may be more important since the coreference and ellipsis require "global" view to make the final decision. Maybe it is not necessary of CNN if there is a replacement of UNet on semantic segmentation.
  3. No other tricks in training. Could you report your metric in reproducing and the detailed reproducing steps. I will try my best to have a look.
SivilTaram commented 2 years ago

Closed since there is no more activity.