kiharalab / Domain-PFP

Domain-PFP is a self-supervised method to predict protein functions from the domains
GNU General Public License v3.0
8 stars 1 forks source link

Multiple protein sequences #1

Open hnlixuanji opened 12 months ago

hnlixuanji commented 12 months ago

Dear Author.

Thank you for your contribution to protein function prediction. According to your paper, Domain-PFP performs well against many of the latest tools. I am considering using your tool to assign functions to multiple protein sequences (tens of thousands) in a fasta file. The highest confidence MF, BP, CC will be selected for each sequence. so I was wondering if you have developed this script so I don't have to repeat it again :-)

Best, XJ

nibtehaz commented 12 months ago

Hi XJ

Thank you for your interest in our project.

We plan to release a web-server. That's why our github sample code and google colab version is for 1 protein sequence at a time and the batch processing will be performed in the server later. For the time being for multiple proteins, we suggest writing a bash script pointing to the fasta files sequentially.

We have various scripts for batch processing, used during our experiments (since they are nor properly cleaned and refactored, they were not released). However, if you have any particular specification of how your input is and how you would like the output to be, I can share some scripts accordingly.

hnlixuanji commented 12 months ago

Dear nibtehaz Thank you very much for your reply. Our input is a file containing a catalog of all non-redundant genes (starting with ">gene name"). I need to convert all genes to protein sequences before using your tool. I would like our output to be a CSV file containing all the genes with column names "gene_name", "GO_MF", "MF_definination", "Go term", "Confidence", "GO_BP", "BP_definination", "Go term", " Confidence", "GO_CC", "CC_definination", "Go term", "Confidence". Or maybe you have other better ideas or scripts to show all the genes.

BTW, I have an open question :-) have you tried to or plan to integrate Alpha-Fold into the function prediction?

Best, XJ

nibtehaz commented 11 months ago

Hi XJ

Sure, I can prepare a script like that. The input will be a large fasta file right?

We have plans to use structure from AlphaFold in protein function prediction. But at this moment we are not actively pursuing that.

hnlixuanji commented 11 months ago

Yes, it is a large fast file. Thank you a lot!