Counting alignment operations error

ksahlin commented 4 years ago

Hi Ivan,

I get the following error when running graphmap2 on both my ONT cDNA datasets (one dataset can be found here (from this webpage: https://github.com/nanopore-wgs-consortium/NA12878/blob/master/nanopore-human-transcriptome/fastq_fast5_bulk.md)

.graphmap2: src/alignment/cigargen.cc:547: int CountAlignmentOperations(std::vector<unsigned char>&, const int8_t*, const int8_t*, int64_t, int64_t, SeqOrientation, int64_t, int64_t, int64_t, int64_t, bool, int64_t*, int64_t*, int64_t*, int64_t*, int64_t*, int64_t*, int64_t*): Assertion `alignment_position_start >= 0 && "Serious problem in counting alignment operations, stemmed from alignment somewhere."' failed.

Any ideas what this might be?

Best, Kristoffer

ksahlin commented 4 years ago

oh, and this is the command line:

graphmap2 align --index index.gmidx --gtf NA12878.gtf --threads 62 -x rnaseq -r Homo_sapiens.GRCh38.dna.primary_assembly.fa -d ont_human.fq -o reads.sam

some of the log:

[09:08:03 BuildIndexes] Loading GTF annotations.
[09:08:51 BuildIndexes] Loading genomic sequences.
[09:09:22 BuildIndexes] Generating the transcriptome.
[09:09:22 GenerateTranscriptomeSeqs] Constructing the transcriptome sequences.
[09:09:22 GenerateTranscriptomeSeqs] In total, there are 67 transcripts.
[09:09:22 SetupIndex_] Loading index from file: '/nfs/brubeck.bx.psu.edu/scratch4/ksahlin/ultra_eval/alignments/graphmap2/ont_human/index.gmidx'.
[09:09:22 Index] Memory consumption: [currentRSS = 166 MB, peakRSS = 3352 MB]
[09:09:22 Run] Hits will be thresholded at the percentil value (percentil: 99.000000%, frequency: 6).
[09:09:22 Run] Minimizers will be used. Minimizer window length: 5
[09:09:22 Run] Reference genome is assumed to be linear.
[09:09:22 Run] One or more similarly good alignments will be output per mapped read. Will be marked secondary.
[09:09:23 ProcessReads] All reads will be loaded in memory.
[09:09:36 ProcessReads] All reads loaded in 11.59 sec (size around 1490 MB). (722592014 bases)
[09:09:36 ProcessReads] Memory consumption: [currentRSS = 1954 MB, peakRSS = 3352 MB]
...here is the progressbar and the error happens...
Command terminated by signal 6

jmaricb commented 4 years ago

@ksahlin Hi, could you tell me where can I download the reference and gtf file you are using. I can see it is the 38 human chromosome, but I would like to use the same files to run it on my side.

ksahlin commented 4 years ago

Of course,

the genome is from ensemble release 98 from ftp://ftp.ensembl.org/pub/release-98/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

The GTF annotation is ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.chr_patch_hapl_scaff.annotation.gtf.gz

Thanks!

ksahlin commented 4 years ago

To further troubleshoot:

I get the same error when running without the GTF file as

graphmap2 align --index index.gmidx  --threads 32 -x rnaseq -r Homo_sapiens.GRCh38.dna.primary_assembly.fa -d ont_human.fq -o reads.sam

I may also add that I'm running graphmap2 through a snakemake pipeline, which executes 2 jobs of graphmap2 at the same time (on two separate nodes though). Are there any temporary files created in, e.g., the execution directory that may interfere with each other?

Best, Kristoffer

ksahlin commented 4 years ago

I have probably caused this error because the chromosome labels in my GTF didn't match the labels in the fasta (chr1 --> 1,..., MT --> chrM).

I fixed this and it seems to be working fine. Maybe a small request would be a more informative error message, but it was ultimately an error caused by the user :) I will get back to you if I run into any other problems.

jmaricb commented 4 years ago

Ok. I was able to see where the error was, but I didn't know what caused it. I will put more clear message at that place. We will release a new version with more precise alignments in a few days.

lbcb-sci / graphmap2

Counting alignment operations error #8