jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

PreQC aborted #79

Closed habibr closed 10 years ago

habibr commented 10 years ago

I did several preqc using SGA, several finished OK, while others had something like this when generating .preqc files:

Preprocess stats: Reads parsed: 332727764 Reads kept: 331562480 (0.996498) Reads failed primer screen: 45 (1.35246e-07) Bases parsed: 32582949743 Bases kept: 32513738777 (0.997876) Number of incorrectly paired reads that were discarded: 0 [timer - sga preprocess] wall clock: 16390.13s CPU: 3783.79s [timer - sga index] wall clock: 12670.03s CPU: 36254.45s Building index for flexbar_sga.fastq.gz in memory using ropebwt done bwt construction, generating .sai file Loading FM-index of flexbar_sga.fastq.gz terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr sga_preqc.sh: line 8: 29884 Aborted $sga preqc -t 8 flexbar_sga.fastq.gz > flexbar_sga.preqc

Could you tell me what was going wrong? The last lines in .preqc files seemed to be truncated at kmer-depth counting stats.

jts commented 10 years ago

Hi,

Is it possible to send me the files that are causing the problem or the subset of reads? I don't know what the problem is from the error message.

Thanks, Jared

On Sat, Sep 6, 2014 at 11:32 PM, Habib R notifications@github.com wrote:

I did several preqc using SGA, several finished OK, while others had something like this when generating .preqc files:

Preprocess stats: Reads parsed: 332727764 Reads kept: 331562480 (0.996498) Reads failed primer screen: 45 (1.35246e-07) Bases parsed: 32582949743 Bases kept: 32513738777 (0.997876) Number of incorrectly paired reads that were discarded: 0 [timer - sga preprocess] wall clock: 16390.13s CPU: 3783.79s [timer - sga index] wall clock: 12670.03s CPU: 36254.45s Building index for flexbar_sga.fastq.gz in memory using ropebwt done bwt construction, generating .sai file Loading FM-index of flexbar_sga.fastq.gz terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr sga_preqc.sh: line 8: 29884 Aborted $sga preqc -t 8 flexbar_sga.fastq.gz > flexbar_sga.preqc

Could you tell me what was going wrong? The last lines in .preqc files seemed to be truncated at kmer-depth counting stats.

— Reply to this email directly or view it on GitHub https://github.com/jts/sga/issues/79.

habibr commented 10 years ago

Thanks for your reply. It was a rather huge fastq with 200 million 100bp paired reads. Unfortunately I am away this week.

Habib Rijzaani Laboratorium Genomika & Bioinformatika Bb Biogen - Balitbangtan Jl. Tentara Pelajar 3A Bogor 16111 Pada 9 Sep 2014 01:50, "Jared Simpson" notifications@github.com menulis:

Hi,

Is it possible to send me the files that are causing the problem or the subset of reads? I don't know what the problem is from the error message.

Thanks, Jared

On Sat, Sep 6, 2014 at 11:32 PM, Habib R notifications@github.com wrote:

I did several preqc using SGA, several finished OK, while others had something like this when generating .preqc files:

Preprocess stats: Reads parsed: 332727764 Reads kept: 331562480 (0.996498) Reads failed primer screen: 45 (1.35246e-07) Bases parsed: 32582949743 Bases kept: 32513738777 (0.997876) Number of incorrectly paired reads that were discarded: 0 [timer - sga preprocess] wall clock: 16390.13s CPU: 3783.79s [timer - sga index] wall clock: 12670.03s CPU: 36254.45s Building index for flexbar_sga.fastq.gz in memory using ropebwt done bwt construction, generating .sai file Loading FM-index of flexbar_sga.fastq.gz terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr sga_preqc.sh: line 8: 29884 Aborted $sga preqc -t 8 flexbar_sga.fastq.gz

flexbar_sga.preqc

Could you tell me what was going wrong? The last lines in .preqc files seemed to be truncated at kmer-depth counting stats.

— Reply to this email directly or view it on GitHub https://github.com/jts/sga/issues/79.

— Reply to this email directly or view it on GitHub https://github.com/jts/sga/issues/79#issuecomment-54868488.

habibr commented 10 years ago

Hi Jared,

You can find a subset of the reads that has been preprocessed using sga in my git repo: https://github.com/habibr/myrepo

I did the following to preprocess the original 20 Gb gzipped reads:

sga preprocess -v -p 1 --pe-orphans=flexbar_sga_singles.fastq -m 21 -s 0.001 flexbar_1.fastq.gz flexbar_2.fastq.gz |gzip -c > flexbar_sga.fastq.gz

then i used the following commands for indexing and preqc:

sga index -a ropebwt --no-reverse -t 8 flexbar_sga.fastq.gz sga preqc -v -t 8 flexbar_sga.fastq.gz > flexbar_sga.preqc

it aborted with the same error messages

I tried the commands on both old RHEL5 and new Ubuntu 14.04 LTS with the same results.

Please find also the std_err file that shows the error messages at the end in the repository.

Habib Rijzaani

BB-Biogen, Badan Litbang Pertanian, Kementerian Pertanian Jl. Tentara Pelajar 3A Bogor 16111 +62 251 8337975

On Wed, Sep 17, 2014 at 8:39 AM, Habib Rijzaani habibrij@gmail.com wrote:

HI Jared,

here is a subset of the reads that has been preprocessed using sga.

sga preprocess -v -p 1 --pe-orphans=flexbar_sga_singles.fastq -m 21 -s 0.001 flexbar_1.fastq.gz flexbar_2.fastq.gz |gzip -c > flexbar_sga.fastq.gz

then i used the following commands:

sga index -a ropebwt --no-reverse -t 8 flexbar_sga.fastq.gz sga preqc -v -t 8 flexbar_sga.fastq.gz > flexbar_sga.preqc

it aborted with the same error messages

I tried the commands on both old RHEL5 and new Ubuntu 14.04 LTS with the same results.

Please find also the std_err file that shows the error messages at the end.

Habib Rijzaani

BB-Biogen, Badan Litbang Pertanian, Kementerian Pertanian Jl. Tentara Pelajar 3A Bogor 16111 +62 251 8337975

On Tue, Sep 9, 2014 at 1:50 AM, Jared Simpson notifications@github.com wrote:

Hi,

Is it possible to send me the files that are causing the problem or the subset of reads? I don't know what the problem is from the error message.

Thanks, Jared

On Sat, Sep 6, 2014 at 11:32 PM, Habib R notifications@github.com wrote:

I did several preqc using SGA, several finished OK, while others had something like this when generating .preqc files:

Preprocess stats: Reads parsed: 332727764 Reads kept: 331562480 (0.996498) Reads failed primer screen: 45 (1.35246e-07) Bases parsed: 32582949743 Bases kept: 32513738777 (0.997876) Number of incorrectly paired reads that were discarded: 0 [timer - sga preprocess] wall clock: 16390.13s CPU: 3783.79s [timer - sga index] wall clock: 12670.03s CPU: 36254.45s Building index for flexbar_sga.fastq.gz in memory using ropebwt done bwt construction, generating .sai file Loading FM-index of flexbar_sga.fastq.gz terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr sga_preqc.sh: line 8: 29884 Aborted $sga preqc -t 8 flexbar_sga.fastq.gz > flexbar_sga.preqc

Could you tell me what was going wrong? The last lines in .preqc files seemed to be truncated at kmer-depth counting stats.

— Reply to this email directly or view it on GitHub https://github.com/jts/sga/issues/79.

— Reply to this email directly or view it on GitHub https://github.com/jts/sga/issues/79#issuecomment-54868488.

jts commented 10 years ago

Thanks for the test case but it completes successfully on my machine. I ran these commands:

sga index -a ropebwt --no-reverse -t 8 flexbar_sga.fastq.gz
sga preqc -v -t 8 flexbar_sga.fastq.gz
[timer - sga::preqc] wall clock: 1468.66s CPU: 7762.26s
jts commented 10 years ago

Hi Habib,

Is this still an issue?

Jared

habibr commented 10 years ago

Hi Jared,

maybe it was a memory issue. But I could not confirm yet. I repeated the analysis with full dataset and it ran OK. But on a machine with 6 Gb memory the issue still persisted with those particular dataset.

jts commented 10 years ago

Ok, thanks for the update. It is probably a memory issue. I will close for now, re-open if this is an issue again.