sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.9k stars 480 forks source link

No info in output.cvs LazyAF colab AlphaFold2-Multimer #587

Open pauldambio opened 6 months ago

pauldambio commented 6 months ago

Expected Behavior

An output.csv table with a list of each interaction

Current Behavior

outup.csv table with neither names of proteins or pTM/ipTM scores

Steps to Reproduce (for bugs)

input_dir: /content/drive/MyDrive/input result_dir: /content/drive/MyDrive/input_fasta input_file: protsequence.txt
protsequence.txt

bait_protein_sequence: MARSKTAQPKHSLRKIAVVVATAVSGMSVYAQAAVEPKEDTITVTAAPAPQESAWGPAAT IAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTY DHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGG LLNMVSKRPTTEPLKEVQFKAGTDSLFQTGFDFSDSLDDDGVYSYRLTGLARSANAQQKG SEEQRYAIAPAFTWRPDDKTNFTFLSYFQNEPETGYYGWLPKEGTVEPLPNGKRLPTDFN EGAKNNTYSRNEKMVGYSFDHEFNDTFTVRQNLRFAENKTSQNSVYGYGVCSDPANAYSK QCAALAPADKGHYLARKYVVDDEKLQNFSVDTQLQSKFATGDIDHTLLTGVDFMRMRNDI NAWFGYDDSVPLLNLYNPVNTDFDFNAKDPANSGPYRILNKQKQTGVYVQDQAQWDKVLV TLGGRYDWADQESLNRVAGTTDKRDDKQFTWRGGVNYLFDNGVTPYFSYSESFEPSSQVG KDGNIFAPSKGKQYEVGVKYVPEDRPIVVTGAVYNLTKTNNLMADPEGSFFSVEGGEIRA RGVEIEAKAALSASVNVVGSYTYTDAEYTTDTTYKGNTPAQVPKHMASLWADYTFFDGPL SGLTLGTGGRYTGSSYGDPANSFKVGSYTVVDALVRYDLARVGMAGSNVALHVNNLFDRE YVASCFNTYGCFWGAERQVVATATFRF bait_protein_name: FhuA

ColabFold Output (for bugs)

Files attached in zip format drive-download-20240316T110814Z-001.zip

Context

I want to test if the multimer model (through colab LazyAF) is able to predict a known interaction between a membrane protein (FhuA) from E. coli and a receptor binding protein (RBP5BPT5) from T5 phage. I curate my fasta file to contain just 2 protein sequences (one is the receptor binding protein) I run the 3 parts of the pipeline with no error. The output of pipeline 3 is the output.cvs table containing no information of the PPI. By checking the log file in the result folder I see the following issue in line 4: Could not generate input features AAS77195.1-FhuA: Invalid character in the sequence:

Your Environment

I run the colab in google chrome Version 122.0.6261.129 (Official Build) (64-bit) Device specifications: Processor 11th Gen Intel(R) Core(TM) i5-11400H @ 2.70GHz 2.69 GHz Installed RAM 16,0 GB (15,8 GB usable) System type 64-bit operating system, x64-based processor Windows 11 Home

pauldambio commented 6 months ago

After searching on similar cases, I found this is a bug from the first part of the pipeline. When creating individual fasta files from the multifasta and the bait sequence, it adds spaces to the bait sequence so MMseqs2 can not search using those files.