labstructbioinf / pLM-BLAST

Detection of remote homology by comparison of protein language model representations
https://toolkit.tuebingen.mpg.de/tools/plmblast
MIT License
41 stars 5 forks source link

Redundant alignment shift #30

Open KYQiu21 opened 11 months ago

KYQiu21 commented 11 months ago

Dear developers,

Thanks very much for this excellent software. I encountered these redundant local alignments after postprocessing (see the figure). These alignments are several-residue shifted with each other, and i wonder how to deal with this.

This case is run using the same code as the "advanced" example, and this sequence is: AVEPKEDTITVTAAPAPQESAWGPAATIAARQSATGTKTDTPIQKVPQSISVVTAEEMALHQPKSVKEALSYTPGVSVGTRGASNTYDHLIIRGFAAEGQSQNNYLNGLKLQGNFYNDAVIDPYMLERAEIMRGPVSVLYGKSSPGGLLNMVSKRPTTEPLKEVQFKAGTDSLFQTGFDFSDSLDDDGVYSYRLTGLARSANAQQKGSEEQRYAIAPAFTWRPDDKTNFTFLSYFQNEPETGYYGWLPKEGTVEPLPNGKRLPTDFNEGAKNNTYSRNEKMVGYSFDHEFNDTFTVRQNLRFAENKTSQNSVYGYGVCSDPANAYSKQCAALAPADKGHYLARKYVVDDEKLQNFSVDTQLQSKFATGDIDHTLLTGVDFMRMRNDINAWFGYDDSVPLLNLYNPSSHHHHHHGSSVNTDFDFNAKDPANSGPYRILNKQKQTGVYVQDQAQWDKVLVTLGGRYDWADQESLNRVAGTTDKRDDKQFTWRGGVNYLFDNGVTPYFSYSESFEPSSQVGKDGNIFAPSKGKQYEVGVKYVPEDRPIVVTGAVYNLTKTNNLMADPEGSFFSVEGGEIRARGVEIEAKAALSASVNVVGSYTYTDAEYTTDTTYKGNTPAQVPKHMASLWADYTFFDGPLSGLTLGTGGRYTGSSYGDPANSFKVGSYTVVDALVRYDLARVGMAGSNVALHVNNLFDREYVASCFNTYGCFWGAERQVVATATFRF

image

Argusmocny commented 9 months ago

Now plmblast.py supports additional argument bfactor X, where X is integer > 0. The bigger bfactor the more sparse search - thus there should be less highly overlapping hits.

papelypluma commented 9 months ago

@Argusmocny I would like to ask if there's an advisable scale for the bfactor? how big should it be (ballpark estimate) to minimize the overlapping hits or speed up the alignment search. Thank you!