jeffdaily / parasail

Pairwise Sequence Alignment Library
Other
243 stars 34 forks source link

segfaults when using (some) profile functions #99

Closed brendanf closed 1 year ago

brendanf commented 1 year ago

I'm using version 2.6.1 (commit 8bcb4f802b18937798149490cab347c17dd5dce3) on RHEL 8, and getting segfaults when trying to use some of the stats_*_profile functions. I originally found this when trying to use the library in my own code, but it also happens using parasail_align.

e.g.:

$ parasail_aligner -f seq1.fasta -q seq2.fasta -a nw_stats_striped_profile_8 -v -m dnafull -t1
    parasail version: 2.6.1
            funcname: nw_stats_striped_profile_8
              cutoff: 7
      case sensitive: no
    alphabet aliases: <no aliases>
          use filter: yes
          gap_extend: 1
            gap_open: 10
              matrix: dnafull
                 AOL: 80
                 SIM: 40
                  OS: 30
                file: seq1.fasta
               query: seq2.fasta
              output: parasail.csv
          batch_size: 0
       memory_budget: 7.9743 GB
  read and pack time: 0.0002 seconds
            sentinal: $
end of packed buffer: 1014
 number of sequences: 2
   number of queries: 1
   number of db seqs: 1
     induced SA time: 0.0003 seconds
      naive BWT time: 0.0000 seconds
      clamp LCP time: 0.0000 seconds
            ESA time: 0.0002 seconds
      possible pairs: 1
     generated pairs: 28
        unique pairs: 1
     omp num threads: 1
    openmp prep time: 0.0000 seconds
        profile init: 0.0000 seconds
    profile creation: 0.0001 seconds
Caught SIGSEGV: Segmentation Fault
(base) [brfurnea@bio2108-l12 data-raw]$ parasail_aligner -f seq1.fasta -q seq2.fasta -a nw_stats_striped_profile_16 -v -m dnafull -t1
    parasail version: 2.6.1
            funcname: nw_stats_striped_profile_16
              cutoff: 7
      case sensitive: no
    alphabet aliases: <no aliases>
          use filter: yes
          gap_extend: 1
            gap_open: 10
              matrix: dnafull
                 AOL: 80
                 SIM: 40
                  OS: 30
                file: seq1.fasta
               query: seq2.fasta
              output: parasail.csv
          batch_size: 0
       memory_budget: 7.9743 GB
  read and pack time: 0.0002 seconds
            sentinal: $
end of packed buffer: 1014
 number of sequences: 2
   number of queries: 1
   number of db seqs: 1
     induced SA time: 0.0003 seconds
      naive BWT time: 0.0000 seconds
      clamp LCP time: 0.0000 seconds
            ESA time: 0.0002 seconds
      possible pairs: 1
     generated pairs: 28
        unique pairs: 1
     omp num threads: 1
    openmp prep time: 0.0000 seconds
        profile init: 0.0000 seconds
    profile creation: 0.0001 seconds
Caught SIGSEGV: Segmentation Fault

I have not tested all possible variations, but I have found the following:

sw_stats_striped_profile_16: works
nw_stats_striped_profile_16: segfault
sg_stats_striped_profile_16: segfault
sw_stats_scan_profile_16: segfault
nw_stats_scan_profile_16: segfault
sg_stats_scan_profile_16: segfault
sw_stats_striped_profile_8: works
sw_stats_striped_profile_32: works
nw_stats_striped_profile_8: segfault
nw_stats_striped_profile_32: segfault
sw_striped_profile_16: works
nw_striped_profile_16: works
sg_striped_profile_16: works
sw_scan_profile_16: works
nw_scan_profile_16: works
sg_scan_profile_16: works

My input files are: seq1.fasta

>0
AAAAGGAAAAAAGGGATCTACCACCGGGATGTTCATAACCCTTTGTTGTCCGACTCTGTTGCCTCCGGGGCGACCCTGCC
TTCGGGCGGGGGCTCCGGGTGGACACTTCAAACTCTTGCGTAACTTTGCAGTCTGAGTAAACTTAATTAATAAATTAAAA
CTTTTAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTC
AGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCCTGGTATTCCGGGGGGCATGCCTGTTCGAGCGTCATTTCACCA
CTCAAGCCTCGCTTGGTATTGGGCATCGCGGTCCGCCGCGTGCCTCAAATCGACCGGCTGGGTCTTCTGTCCCCTAAGCG
TTGTGGAAACTATTCGCTAAAGGGTGTTCGGGAGGCTACGCCGTAAACAACCCCATTTCTAAAGTTGACCTCGGATCAGG
TAGGGATACCCGCTGAACTTAAGCATATCAATAAGCGGGAGGAAAA

seq2.fasta

>1
ACAAGGTTTCCGTAGGTGAACCTGCGGAGGGATCATTACAAGTTGACCCCGGCCCTCGGGCCGGGATGTTCACAACCCTT
TGTTGTCCGACTCTGTTGCCTCCGGGGCGACCCTGCCTCCGGGCGGGGGCCCCGGGTGGACACTTCAAACTCTTGCGTAA
CTTTGCAGTCTGAGTAAATTTAATTAATAAATTAAAACTTTCAACAACGGATCTATTGGTTCTGGCATCGATGAAGAACG
CAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCCCTGGTA
TTCCGGGGGGCATGCCTGTTCGAGCGTCATTTCACCACTCAAGCCTCGCTTGGTATTGGGCGACGCGGTCCGCCGCGCGC
CTCAAATCGACCGGCTGGGTCTTCCGTCCCCTCAGCGTTGTGGAAACTATTCGCTAAAGGGTGCCGCGGGAGGTCACGCC
GCAAAA
brendanf commented 1 year ago

I've played around with specifying the SIMD architecture using the -a flag. All variants that I have tested using sse2_128, sse41_128, and avx2_256 have worked, while the variants using altivec_128 and neon_128 have failed with the error Specified [profile ]function not found.

Looking back at the autoconf output, I find:

   Altivec : ............................. auto (no)
    ALTIVEC_CFLAGS : ...................... not supported

    ARM NEON : ............................ auto (no)
    NEON_CFLAGS : ......................... not supported
    EXTRA_NEON_CFLAGS : ................... -fopenmp-simd -DSIMDE_ENABLE_OPENMP

so it is no mystery that the functions aren't present. But maybe the SIMD architecture dispatch code is somehow expecting all of the functions to be present, and this is leading to the segfault?

jeffdaily commented 1 year ago

Thanks for the debugging. You're likely correct that the dispatching is making some incorrect assumptions. I'll take a look.

jeffdaily commented 1 year ago

I was able to reproduce.

jeffdaily commented 1 year ago

Can you please try tip of tree develop with this commit: https://github.com/jeffdaily/parasail/commit/98abee79f24d54849ec3c987430c28092bf01fe3

brendanf commented 1 year ago

Thanks for the quick attention! Unfortunately I am still getting the same errors from parasail_align.

jeffdaily commented 1 year ago

:-( I cannot repro after I make that fix. Sorry to be skeptical, but you did try latest tip of tree develop branch?

brendanf commented 1 year ago

I was also skeptical, and thought I must have not gotten the new version. However I tried again just now:

$ git log -1
commit 98abee79f24d54849ec3c987430c28092bf01fe3 (HEAD -> develop, origin/develop)
Author: Jeff Daily <jeffrey.daily@gmail.com>
Date:   Wed Mar 1 15:16:38 2023 -0700

    Fixes #99. stats dispatcher pointed to non-stats functions.
$ make clean
# output omitted
$ ls apps
README.md  meson.build  parasail.csv  parasail_aligner.cpp  parasail_stats.c
$ make
# output omitted
$ ./apps/parasail_aligner -f ~/projects/optimotu/data-raw/seq1.fasta -q ~/projects/optimotu/data-raw/seq2.fasta -a sw_stats_scan_profile_16 -v -m dnafull -t1
    parasail version: 2.6.1
            funcname: sw_stats_scan_profile_16
              cutoff: 7
      case sensitive: no
    alphabet aliases: <no aliases>
          use filter: yes
          gap_extend: 1
            gap_open: 10
              matrix: dnafull
                 AOL: 80
                 SIM: 40
                  OS: 30
                file: /home/brfurnea/projects/optimotu/data-raw/seq1.fasta
               query: /home/brfurnea/projects/optimotu/data-raw/seq2.fasta
              output: parasail.csv
          batch_size: 0
       memory_budget: 7.9743 GB
  read and pack time: 0.0010 seconds
            sentinal: $
end of packed buffer: 1014
 number of sequences: 2
   number of queries: 1
   number of db seqs: 1
     induced SA time: 0.0003 seconds
      naive BWT time: 0.0000 seconds
      clamp LCP time: 0.0000 seconds
            ESA time: 0.0002 seconds
      possible pairs: 1
     generated pairs: 28
        unique pairs: 1
     omp num threads: 1
    openmp prep time: 0.0000 seconds
        profile init: 0.0000 seconds
    profile creation: 0.0001 seconds
Caught SIGSEGV: Segmentation Fault

I had originally done make install, but have subsequently followed up with make uninstall and verified that the library is not present anymore. Can you think of anything else I might have missed?

jeffdaily commented 1 year ago

Thank you for sticking with me on this. I just pushed another commit. I didn't catch all of the cases where stats functions weren't getting called by the dispatcher. Please try latest tip of develop.

b0f4141ab60c90f381c29910a286564711e2b6a3

brendanf commented 1 year ago

This now works for me in all of the cases which were previously failing. Thanks!