using ps_scan.pl to scan prosite patterns

gsn7 commented 4 years ago

what is the best way to scan prosite patterns with pftools3. looking at the way ps_scan.pl deals with prosite patterns, without knowing much perl, it seems to me it creates a file for each profile and a file for each input sequence and then scan the sequences against the profile generating a third file for the results. that is a lot of IO. we run prosite patterns against UniParc so we would be happy to hear if there is a different way of scanning against prosite patterns, with less io overhead

smoretti commented 3 years ago

We will investigate this. @euphemizm an idea? I think there are a sub-command and a file created for each input sequence, not for each profile.

To avoid lots of IO use pfscanV3 directly.

smoretti commented 3 years ago

More details (thanks @beatrice79)

pfsearch: 1 motif - many sequences
pfscan: many motifs - 1 sequence

e.g. to query 10 sequences against whole prosite:

pfsearch: as many sub-commands and files as in prosite.dat
pfscan: as many sub-commands and files as in our set of sequences (=> 10)

For cases with many sequences vs many motifs:

Use pfscan when number of motifs > number of sequences
Use pfsearch when number of sequences > number of motifs

pf_scan.pl uses by default pfscan. To query against UniParc maybe better to switch to pfsearch with ps_scan.pl -w pfsearch

But as I said before pftoolsv3 are more optimized for such cases, and it is now available in ps_scan.pl (if I remember well) @euphemizm correct me if I'm wrong: ps_scan.pl -w pfsearchV3 or ps_scan.pl --pfscan $PATH/pfscanV3

sib-swiss / pftools3

using ps_scan.pl to scan prosite patterns #16