Open pauldambio opened 6 months ago
After searching on similar cases, I found this is a bug from the first part of the pipeline. When creating individual fasta files from the multifasta and the bait sequence, it adds spaces to the bait sequence so MMseqs2 can not search using those files.
Expected Behavior
An output.csv table with a list of each interaction
Current Behavior
outup.csv table with neither names of proteins or pTM/ipTM scores
Steps to Reproduce (for bugs)
input_dir: /content/drive/MyDrive/input result_dir: /content/drive/MyDrive/input_fasta input_file: protsequence.txt
protsequence.txt
bait_protein_sequence: MARSKTAQPKHSLRKIAVVVATAVSGMSVYAQAAVEPKEDTITVTAAPAPQESAWGPAAT IAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTY DHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGG LLNMVSKRPTTEPLKEVQFKAGTDSLFQTGFDFSDSLDDDGVYSYRLTGLARSANAQQKG SEEQRYAIAPAFTWRPDDKTNFTFLSYFQNEPETGYYGWLPKEGTVEPLPNGKRLPTDFN EGAKNNTYSRNEKMVGYSFDHEFNDTFTVRQNLRFAENKTSQNSVYGYGVCSDPANAYSK QCAALAPADKGHYLARKYVVDDEKLQNFSVDTQLQSKFATGDIDHTLLTGVDFMRMRNDI NAWFGYDDSVPLLNLYNPVNTDFDFNAKDPANSGPYRILNKQKQTGVYVQDQAQWDKVLV TLGGRYDWADQESLNRVAGTTDKRDDKQFTWRGGVNYLFDNGVTPYFSYSESFEPSSQVG KDGNIFAPSKGKQYEVGVKYVPEDRPIVVTGAVYNLTKTNNLMADPEGSFFSVEGGEIRA RGVEIEAKAALSASVNVVGSYTYTDAEYTTDTTYKGNTPAQVPKHMASLWADYTFFDGPL SGLTLGTGGRYTGSSYGDPANSFKVGSYTVVDALVRYDLARVGMAGSNVALHVNNLFDRE YVASCFNTYGCFWGAERQVVATATFRF bait_protein_name: FhuA
ColabFold Output (for bugs)
Files attached in zip format drive-download-20240316T110814Z-001.zip
Context
I want to test if the multimer model (through colab LazyAF) is able to predict a known interaction between a membrane protein (FhuA) from E. coli and a receptor binding protein (RBP5BPT5) from T5 phage. I curate my fasta file to contain just 2 protein sequences (one is the receptor binding protein) I run the 3 parts of the pipeline with no error. The output of pipeline 3 is the output.cvs table containing no information of the PPI. By checking the log file in the result folder I see the following issue in line 4: Could not generate input features AAS77195.1-FhuA: Invalid character in the sequence:
Your Environment
I run the colab in google chrome Version 122.0.6261.129 (Official Build) (64-bit) Device specifications: Processor 11th Gen Intel(R) Core(TM) i5-11400H @ 2.70GHz 2.69 GHz Installed RAM 16,0 GB (15,8 GB usable) System type 64-bit operating system, x64-based processor Windows 11 Home