sib-swiss / pftools3

A suite of tools to build and search generalized profiles
GNU General Public License v2.0
10 stars 7 forks source link

pfscanV3 cannot use motif files that contain patterns #4

Closed euphemizm closed 4 years ago

euphemizm commented 4 years ago

If psfscanV3 (>=3.2.0) is used against e.g. ftp://ftp.expasy.org/databases/prosite/prosite.dat

e.g. pfscanV3 -o3 --matrix-only -L-1 prosite.dat seq.tmp

It will fail with Complementary option may not be used on pattern profile, Error complementing alphabet, Error found reading profilemessages. (It worked fine with v3.1)

It looks there is a new check performed on all motifs, when it should be done only on matrix (generalized profiles). This matrix specific test will fail if the motif file contains patterns. This test is performed before even considering skipping patterns with --matrix-only.

I can see that during INPUT ANALYSIS ReadProfile method (src/C/utils/io.c) is called to load/check/count motifs, it loops over motifs calling internalReadProfile method. internalReadProfile will systematically check motifs via ComplementAlphabet method. It should do it only for matrix (+ it looks like the first motif is tested over again instead of the current one)

trying:

FROM io.c 1533 if (ComplementAlphabet(prf)!= 0) { TO io.c 1533 if ( newPrf->Type == PF_MATRIX && ComplementAlphabet(newPrf)!= 0 ) {

fixes the problem ... (& tests pass).

I will check more (side effect etc...), update test suite, update release number? (3.2.2?), then submit a pull request...

p.s. used cmake options: -DC_ONLY=ON -DUSE_32BIT_INTEGER=ON -DUSE_PCRE=OFF

smoretti commented 4 years ago

Merged!

We have added you as a "writer" in the repository. In the future could you modify or pull request on develop branch?