karel-brinda / rnftools

RNF framework for NGS: simulation of reads, evaluation of mappers, conversion of RNF-compliant data.
http://karel-brinda.github.io/rnftools
MIT License
14 stars 5 forks source link

DWGSim hangs, then deletes files when job is killed #70

Closed RSherman15 closed 6 years ago

RSherman15 commented 6 years ago

I'm currently trying to get simulated data from two files into one simulation, as per one of the tutorial examples. I'm using DWGSim and my Snakefile looks like this (though I've replaced files with placeholder names here):

rnftools.mishmash.sample("test",reads_in_tuple=2)

file1 = "file1.fa"
file2 = "file2.fa"

rnftools.mishmash.DwgSim(
        fasta=file1,
        coverage=40,
        number_of_read_tuples=0,
        read_length_1=150,
        read_length_2=150,
        distance=500,
        distance_deviation=50
)

rnftools.mishmash.DwgSim(
        fasta=file2,
        coverage=40,
        number_of_read_tuples=0,
        read_length_1=150,
        read_length_2=150,
        distance=500,
        distance_deviation=50
)

# Including Snakemake rules created by RNFtools and defining the main
#    Snakemake rule (declaring which files are requested)
include: rnftools.include()
rule: input: rnftools.input()

DWGSim appeared to simulate all the reads from the first file,

[dwgsim_core] 195 sequences, total length: 3099922541
[dwgsim_core] Currently on: 
419228096

But then hung there (I left it for days, and just went back to check it -- still hanging and all files were last modified several days ago). It never seemed to get to the second file. I decided to kill it, thinking I could then at least use the ~140G fastq files which had been produced in the output directory, even if it hadn't simulated the second smaller file, and simulate the second separately. However, when I killed it (Ctrl-C), it deleted the fastqs.

So in addition to this issue of the simulation handing, I'm wondering if this removal of the files when a job is killed is expected behavior?

karel-brinda commented 6 years ago

Thank you for reporting, I will look at it. With your parameters, what's the expected size of the output FASTQ file? Did you run the simulation locally or on a cluster?

RSherman15 commented 6 years ago

I'm running it on a machine with a large number of cores and RAM, but it's a single machine. I think that the size I had was approximately expected. I'm currently running a much smaller test to see if that completes -- I'll update this thread if/when it either completes or hangs.

RSherman15 commented 6 years ago

A smaller test completed successfully, so perhaps it was just something on my end.