Problems of Investment of MolCLR Representation part

yuyangw / MolCLR

Implementation of MolCLR: "Molecular Contrastive Learning of Representations via Graph Neural Networks" in PyG.

MIT License

233 stars 57 forks source link

Problems of Investment of MolCLR Representation part #21

Closed zqwz909 closed 1 year ago

zqwz909 commented 1 year ago

Hello, I tried to do the experiment in the Investment of MolCLR Representation part of your paper, but the code did not provide the vector to generate the molecular representation. In the encoder part, I used the representation vector generated by the pre-trained gcn, and the result of cosine similarity was very incorrect. How do you use MolCLR to generate molecular representations in this part of the experiment?

yuyangw commented 1 year ago

Hi, thanks for your interest in our work. If I understand your question correctly, you want to extract the representation from GNN. The representations for input molecules are returned as h (https://github.com/yuyangw/MolCLR/blob/master/models/gcn_molclr.py#L158). Hope this helps.

Best, Yuyang

zqwz909 commented 1 year ago

Thank you for pointing out. I have another question. Is the feature vector obtained by the gnn network directly used to compare the cosine distance, or does it need to be normalized?

yuyangw commented 1 year ago

It should be normalized.

zqwz909 commented 1 year ago

OK, can you tell me the normalization method you adopted? This part of the experiment in the paper doesn't seem to mention.Thank you!

yuyangw commented 1 year ago

You can refer to the cosine similarity function in pytorch: https://pytorch.org/docs/stable/generated/torch.nn.functional.cosine_similarity.html, which already implements normalization.

RichardLrx commented 1 year ago

Hello, I used the pre-training model (pretrained_gin/checkpoints) provided by you and achieved good results in my work. Therefore, I would like to know which data augmentations technology you used to get the pre-training model pretrained_gin. Is atom masking? bond deletion? subgraph removal?

yuyangw commented 1 year ago

The checkpoint is trained with subgraph removal since it achieves the best overall performance as reported in our paper.

RichardLrx commented 1 year ago

Thank you!