mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Variant calling with paired-end reads fails #5

Closed rene-rex closed 10 years ago

rene-rex commented 10 years ago

Hello

I tried to follow the "Workflow Examples" in the wiki to do variant calling with two plant genotypes based on paired end reads. Here is the sequence of commands I used:

ctx63 build -m 150G -t 10 -k 61 --sample genotype1 --seq2 genotype1_r1.fastq:genotype1_r2.fastq genotype1.ctx

ctx63 build -m 150G -t 10 -k 61 --sample genotype2 --seq2 genotype2_r1.fastq:genotype2_r2.fastq genotype2.ctx

ctx63 build -m 150G -t 10 -k 61 --sample reference --seq reference.fa reference.ctx

ctx63 clean -m 150G -t 10 --out genotype1.clean.ctx genotype1.ctx
ctx63 clean -m 150G -t 10 --out genotype2.clean.ctx genotype2.ctx

ctx63 join -m 150G --out refAndSamples.clean.ctx reference.ctx genotype1.clean.ctx genotype2.clean.ctx

ctx63 inferedges -m 150G -t 10 refAndSamples.clean.ctx

ctx63 thread -m 150G -t 10 --seq2 genotype1_r1.fastq:genotype1_r2.fastq --out genotype1.ctp.gz refAndSamples.clean.ctx:1
ctx63 thread -m 150G -t 10 --seq2 genotype2_r1.fastq:genotype2_r2.fastq --out genotype2.ctp.gz refAndSamples.clean.ctx:2

Unfortunately, both read threading steps failed with this error:

[src/alignment/correct_aln_stats.c:137] Assert Failed correct_aln_stats_print_summary(): num_reads > 0
[15 Aug 2014 16:22:40-keZ] Assert Error

I tried to "fix" this by adding this code above line 135 in alignment/correct_aln_stats.c:

if(num_reads == 0)
{
status("[CorrectAln] Didn't see any SE reads");
}

After recompiling the threading runs smooth. However, now I am stuck at the contigs command

ctx63 contigs -m 150G -t 10 -p 1:genotype1.ctp.gz -p 2:genotype2.ctp.gz refAndSamples.clean.ctx

which yields:

[src/graph_paths/gpath_checks.c:143] Error gpath_load_sample_pop(): No point loading paths for colours other than --colour 0
[18 Aug 2014 09:42:17-gOC] Fatal Error

Any clues what went wrong?

noporpoise commented 10 years ago

I'm travelling at the moment - will look at fixing num_reads > 0 issue on Monday. Sorry about that.

For the second issue, the contig command only pulls out contigs for a single sample at a time. So you probably want to do:

ctx63 contigs -m 150G -t 10 -c 1 -p 1:genotype1.ctp.gz --out contigs1.fa refAndSamples.clean.ctx
ctx63 contigs -m 150G -t 10 -c 2 -p 2:genotype1.ctp.gz --out contigs2.fa refAndSamples.clean.ctx

Hope that helps. Thank you for trying cortex.

Isaac

rene-rex commented 10 years ago

Hi

Thanks for the answer. I tried the first command

ctx63 contigs -m 150G -t 10 -c 1 -p 1:genotype1.ctp.gz --out contigs1.fa refAndSamples.clean.ctx

but it produced another error:

[src/graph_paths/gpath_checks.c:107] Error graphs_gpaths_compatible(): Sample names don't match
refAndSamples.clean.ctx:0reference
refAndSamples.clean.ctx:2genotype2.tipclean.supclean7
[22 Aug 2014 08:19:32-JOC] Fatal Error

Is it comparing the names of different samples in the same file?

noporpoise commented 10 years ago

I'm trying to resolve this issue at the moment. For now you should be able to work around it by only loading one sample at a time. This is not ideal, but I'll put up a fix shortly.

ctx63 contigs -m 150G -t 10 -c 0 -p genotype1.ctp.gz --out contigs1.fa refAndSamples.clean.ctx:1