zeeev / bevel

Working toward a probabilistic MSA tool
MIT License
5 stars 0 forks source link

segmentation fault when `-d` was specified #1

Open KamilSJaron opened 7 years ago

KamilSJaron commented 7 years ago

Hello,

I compiled bevel with Clang (v 703.0.31) on OS X. I tried to use parameter -d to specify a name of database to write and I got segmentation fault.

$bevel -d Tge_ref_db Tge_ref_filtered.fa 1_Tps_ref_filtered.fa
INFO: Sketching minimizers from file: Tge_genome_db
INFO: Sketched 0 sequence
INFO: Sorting minimizers
Segmentation fault: 11

When I deleted the parameter, it run. (600M vs 300M of sequences in less than minute - really awesome!)

zeeev commented 7 years ago

@KamilSJaron "-d" doesn't take an argument. However, I'll add some code to prevent the segmentation fault. Thanks for bring this to my attention.

KamilSJaron commented 7 years ago

Thanks. I thought that I have to specify the name of the database. Hope my mistake will help anyway, segmentation fault is always a bad thing.

zeeev commented 7 years ago

@KamilSJaron let me know if the most recent push fixes the problem. After a bunch of changes here is what works for me:

bin/bevel -n 3 -k 12 -w 4 -d nothing.txt fads.query.fa fads.target.fa

where nothing.txt doesn't exist.

KamilSJaron commented 7 years ago

Hi, I just pulled and gave another try. I tried several things to text the interface a bit.

I have in folder valid fasta files foo.fa and bar.fa. The file foo_db does not exist.

  1. bevel -d foo_db foo.fa bar.fa 1> out 2> err will produce foo.midx, bar.midx and foo_db.midx files, but prints nothing to output
  2. if you now rerun bevel on the valid fasta using index files from the previous step: bevel -d foo.fa bar.fa 1> out 2> err, it will produce a big output (3.3G).
  3. if you delete everything and rerun bevel -d foo.fa bar.fa 1> out 2> err the new indexes are created and there is an output, but way way smaller (0.4G)
  4. if you delete bar.midx and rerun with only one index file present, you get Segmentation fault: 11, the last lines of err log are:
INFO: Wrote 6603161 minimizers
INFO: Reading minimizers index: 1_Tps_genome_100k_filtered.fa.midx
INFO: Read 8177918 minimizers: 1_Tps_genome_100k_filtered.fa.midx 
INFO: Searching 6603161 by 8177918 minimizers

Behaviour I would expect:

  1. fast down with non-existing input file
  2. is ok, it is problem of the fact that index was created on the non existing file
  3. ok
  4. recomputed indexing file and output

I hope my comments will be helpful.

zeeev commented 7 years ago

Thanks for updating me on this. I'll keep digging. I really appreciate it!

Sent from my iPhone

On Feb 3, 2017, at 9:00 AM, Kamil S Jaroň notifications@github.com wrote:

Hi, I just pulled and gave another try. I have in folder valid fasta files foo.fa and bar.fa. File foo_db does not exist.

bevel -d foo_db foo.fa bar.fa 1> out 2> err will produce foo.midx, bar.midx and foo_db.midx files, but prints nothing to output if you now rerun bevel on the valid fasta using index files from the previous step: bevel -d foo.fa bar.fa 1> out 2> err, it will produce a big output (3.3G). if you delete everything and rerun bevel -d foo.fa bar.fa 1> out 2> err the new indexes are created and there is an output, but way way smaller (0.4G) if you delete bar.midx and rerun with only one index file present, you get "Segmentation fault: 11", the last lines of err log are: INFO: Wrote 6603161 minimizers INFO: Reading minimizers index: 1_Tps_genome_100k_filtered.fa.midx INFO: Read 8177918 minimizers: 1_Tps_genome_100k_filtered.fa.midx INFO: Searching 6603161 by 8177918 minimizers Behaviour I would expect:

fast down with non-existing input file — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

zeeev commented 7 years ago

@KamilSJaron Okay, I've improved the interface for -d: e.g.:

bin/bevel -d test.txt  fads.target.fa fads.query.fa  000162F_quiver.fa

FATAL: One fasta file does not exist.
       -d does not take an argument

I'm much more concerned about the discrepancy you generated with the number of output lines. Can you replicate that strange behavior with the newest version?

KamilSJaron commented 7 years ago

The strange behaviour occurred only in case when three indexes were computed. If program will not allow to create such strange index, then the behaviour wont be reproducible with the current version (I will give a try anyway)

KamilSJaron commented 7 years ago

Update on the new version:

  1. Running alone with -d flag and non-existing file reports non-existing file
  2. -d flag + existing files - runs and produces expected outputs
  3. Running bevel with indexes computed using previous version (point 1. in the post 3 days ago) will lead to crush (previously it computed the huge output):
INFO: Reading minimizers index: foo.fa.midx
INFO: Read 8177918 minimizers: foo.fa.midx 
INFO: Reading minimizers index: bar.fa.midx
INFO: Read 6603161 minimizers: bar.fa.midx 
INFO: Searching 8177918 by 6603161 minimizers
Assertion failed: (t < target->namelen), function search, file ./src/search.h, line 101.
Abort trap: 6

However, I believe that you do not have to really worry about that, since it is impossible to create this weird indexes with this version.

One more very small comment. bevel -v is not returning version (and it would be really awesome if it would)