kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
40 stars 9 forks source link

Empty files produced by kevlar split cause crash of kevlar assemble #374

Open serosko opened 4 years ago

serosko commented 4 years ago

I am trying to run the tutorial workflow with kevlar v0.7+15.gebabd62. The Snakemake workflow stops when coming to the rule "assemble", throwing the following error message:

Traceback (most recent call last):
  File "/fast/users/roskoscs_c/work/miniconda/envs/kevlar-env/bin/kevlar", line 8, in <module>
    sys.exit(main())
  File "/fast/users/roskoscs_c/work/miniconda/envs/kevlar-env/lib/python3.6/site-packages/kevlar/__main__.py", line 30, in main
    mainmethod(args)
  File "/fast/users/roskoscs_c/work/miniconda/envs/kevlar-env/lib/python3.6/site-packages/kevlar/assemble.py", line 58, in main
    for partid, contig in assembler:
  File "/fast/users/roskoscs_c/work/miniconda/envs/kevlar-env/lib/python3.6/site-packages/kevlar/assemble.py", line 30, in assemble
    for partid, partition in partstream:
  File "/fast/users/roskoscs_c/work/miniconda/envs/kevlar-env/lib/python3.6/site-packages/kevlar/seqio.py", line 72, in parse_partitioned_reads
    name = read.name if hasattr(read, 'name') else read.defline
AttributeError: 'NoneType' object has no attribute 'defline'

I inspected the 16 possible input files for the rule (partitioned.{i}.augfastx.gz, i=0...15), and noticed that only the files partitioned.0.augfastx.gz - partitioned.4.augfastx.gz contain reads, while the subsequent files don't contain any reads. Only the assemble steps working on the empty files fail.

standage commented 4 years ago

Thanks for the report. Did you use the default "numsplit": 16 in your config file? As you've probably guessed, this is used to split the data and compute some later steps of the workflow in parallel. For 30x human WGS data, this will provide a decent speedup. For a mock 25 Mb genome with 10 de novo variants I don't expect it to make any practical difference. Try running it with "numsplit": 2 (or even 1) and see if that fixes the problem. If so, I'll add a clarifying note to the tutorial.

serosko commented 4 years ago

Yes, I found the option after writing the issue and tried it with "numsplit": 4. Now that I have the results, I can confirm that this fixed the issue.

standage commented 4 years ago

Roger that. I'll close this thread once the documentation has been clarified.