ml4bio / Dense-Homolog-Retrieval

Nature Biotechnology: Ultra-fast, sensitive detection of protein remote homologs using deep dense retrieval
BSD 3-Clause "New" or "Revised" License
62 stars 1 forks source link

Inquiry about protein structure prediction with colabfold #18

Closed haifangong closed 1 week ago

haifangong commented 2 weeks ago

Thank you for your excellent work. I have successfully retrieved the TSV files containing the homolog sequences of my desired protein. The files are formatted as follows:

UniRef50_A0A7V3D5G3     MVVRILTGHRRAVRTVVYSPTDPKILASGGEDSLIRLWDLATGTVLQELTTHTSGVTCLTFAD
UniRef50_A0A9Q0MY60     MKTFPLQRCTLETVRWH
UniRef50_A0A0A8ZTL2     MHKLYYYINKRVKCTVGP
UniRef50_A0A0A9SGX9     MQFSLLPRSVI

Could you provide detailed instructions or a guide on how to use these predicted MSAs from DHR for protein structure prediction with colabfold? Any additional insights or resources you could share would be immensely helpful.

Thank you for your support and looking forward to your guidance.

daisykuma22 commented 1 week ago

Hi, have you successfully performed embeddings using the UniRef50 database? If so, could you please provide some guidance on how to achieve this?

heathcliff233 commented 1 week ago

@haifangong Hello, thank you for your interest. Currently the usage of downstream MSA building is not provided since it is not our main focus. You may build the MSA with your own tools like JackHMMER(large number requiring post-processing) or kalign(relatively small number). For a use case of JackHMMER, please refer to this https://github.com/ml4bio/Dense-Homolog-Retrieval/blob/4305a6031d757fe624c58e837b923fd80fb7a4f6/do_retrieval.py#L47

haifangong commented 1 week ago

Hi, have you successfully performed embeddings using the UniRef50 database? If so, could you please provide some guidance on how to achieve this?

Sorry, i have tried. But the results seems that this work is not suitable for my task, so I have abandoned fixing these issues.