steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
842 stars 104 forks source link

Suggestion about specific pairwise structure comparison #382

Closed Pooryamb closed 1 week ago

Pooryamb commented 2 weeks ago

Hi Foldseek developers,

I have two databases called db1 and db2. I want to find the structural similarity between some pairs of proteins like ("pr1", "pr2") where pr1 is from db1 and pr2 is from db2. I can do it by exhaustively searching for db1 against db2 and then selecting the pairs that I am interested in, but it is computationally heavy. On the other hand, if we use the pdb files of the proteins and run Foldseek once for each pair of proteins, the overhead cost would be too high. I think it would be great to add a feature to find the structural similarity between specific pairs of proteins that we are interested in. I checked Foldseek issues on GitHub and saw that others also needed the same functionality. Specifically, I saw someone had been interested in an all-against-all structural comparison of a database against itself. Exhaustive searching compares "pr1" and "pr2" twice, but that user needed to compare the two structures only once to save some computations.

Thanks,

milot-mirdita commented 1 week ago

You can make a your own prefiltering database to tell the structurealign module what pairs to align. We don't have any workflow support for this, but you can make your own simple workflow:

# assuming you have a query and target database
foldseek createdb inputs1/ db1
foldseek createdb inputs2/ db2

# make a mapping of the accession that you want to align (check 2nd column in the dbN.lookup file)
echo -e "d1asha_\td1b0ba_\nd1asha_\td1cg5a_\n" > to_align.tsv

# convert this into the internal numeric database keys
awk 'FNR == 1 { findex++; } \
     findex == 1 { f1[$2] = $1; next; } \
     findex == 2 { f2[$2] = $1; next; } \
     $1 in f1 && $2 in f2 { print f1[$1]"\t"f2[$2]; }' \
        db1.lookup db2.lookup <(sort -s -k1,1n to_align.tsv ) > keys.tsv

# make a fake prefiltering database
foldseek tsv2db keys.tsv pref --output-dbtype 7

# foldseek alignment
foldseek structurealign db1 db2 pref aln

# m8 human readable output
foldseek convertalis db1 db2 aln aln.m8
milot-mirdita commented 1 week ago

I added this to the wiki

Pooryamb commented 1 week ago

Thanks!