jonassibbesen / rpvg

Method for inferring path posterior probabilities and abundances from pangenome graph read alignments
MIT License
47 stars 6 forks source link

Inflate operation failed: invalid distance too far back terminate called after throwing an instance of 'std::runtime_error' #64

Open CarlosAmadeo7 opened 1 week ago

CarlosAmadeo7 commented 1 week ago

Hello rpvg team: I've successfully gotten the .gamp file of the transcriptome file with vg mpmap and there was no problem at all. But when I run rpvg, I have this error:

Running rpvg (commit: cd5160deb1a75d745c7ba98dea634c49ccd296b5) Random number generator seed: 1730236892 Fragment length distribution parameters found in alignment (mean: 151.096, standard deviation: 43.1828) Loaded graph, GBWT and r-index (6.607 seconds, 10.2174 GB) [E::bgzf_uncompress] Inflate operation failed: invalid distance too far back terminate called after throwing an instance of 'std::runtime_error' what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 47907863952 group 41477367607 /cm/local/apps/slurm/var/spool/job17498325/slurm_script: line 25: 39401 Aborted

What is weird is that I ran rpvg previously with two different gamp files, and they ran okay, but this one is not working properly.

The command I am using is this one:

singularity exec -B /work /work/public/singularity/rpvg_latest.sif rpvg -t 32 -g $xg_path -p $gwbt_path -f $txt_gz_path -a mpmap_03_control.gamp -o rpvg --inference-model transcripts

where xg_path, gwbt_path, and txt_gz_path are where the files are located. I used the same command to run rpvg before but using different gamp files and it was ok. I would appreciate any help Best

jeizenga commented 6 days ago

It's possibly a truncated file. Can you share how you made the GAMP?

CarlosAmadeo7 commented 6 days ago

Sure It is the same line of code I used to generate my previous 2 gamp files:

singularity exec -B /work/ /work/alfaroqc/apps/vg_v1.57.0.sif vg mpmap -t 32 -x $xg_path -g $gcsa_path -d $dist_path -f $read_1_3 -f $read_2_3 > mpmap_03_control.gamp

where xg_path, gcsa_path, and dist_path are where the files are located, as well as the paired-end reads: read_1_3 and read_2_3

The output I obtained is this one:

[vg mpmap] elapsed time 0 s: Executing command: /vg/bin/vg mpmap -t 32 -x /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.xg -g /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa -d /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/graph.dist -f /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R1.fq -f /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R2.fq [vg mpmap] elapsed time 0 s: Loading graph from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.xg [vg mpmap] elapsed time 4 s: Completed loading graph [vg mpmap] elapsed time 4 s: Graph is in XG format. XG is a good graph format for most mapping use cases. PackedGraph may be selected if memory usage is too high. See vg convert if you want to change graph formats. [vg mpmap] elapsed time 4 s: Identifying reference paths [vg mpmap] elapsed time 5 s: Loading GCSA2 from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa [vg mpmap] elapsed time 5 s: Loading distance index from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/graph.dist (in background) [vg mpmap] elapsed time 8 s: Completed loading distance index [vg mpmap] elapsed time 9 s: Completed loading GCSA2 [vg mpmap] elapsed time 9 s: Loading LCP from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa.lcp [vg mpmap] elapsed time 9 s: Memoizing GCSA2 queries (in background) [vg mpmap] elapsed time 12 s: Completed loading LCP [vg mpmap] elapsed time 13 s: Completed memoizing GCSA2 queries [vg mpmap] elapsed time 13 s: Building null model to calibrate mismapping detection [vg mpmap] elapsed time 15 s: Mapping reads from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R1.fq and /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R2.fq using 32 threads [vg mpmap] elapsed time 50.4 m: Mapped 5000000 read pairs [vg mpmap] elapsed time 1.7 h: Mapped 10000000 read pairs [vg mpmap] elapsed time 2.5 h: Mapped 15000000 read pairs [vg mpmap] elapsed time 3.3 h: Mapped 20000000 read pairs [vg mpmap] elapsed time 4.1 h: Mapped 25000000 read pairs [vg mpmap] elapsed time 5.0 h: Mapped 30000000 read pairs [vg mpmap] elapsed time 5.8 h: Mapped 35000000 read pairs [vg mpmap] elapsed time 6.6 h: Mapped 40000000 read pairs [vg mpmap] elapsed time 7.4 h: Mapped 45000000 read pairs [vg mpmap] elapsed time 8.3 h: Mapped 50000000 read pairs [vg mpmap] elapsed time 9.1 h: Mapped 55000000 read pairs [vg mpmap] elapsed time 9.9 h: Mapped 60000000 read pairs [vg mpmap] elapsed time 10.8 h: Mapped 65000000 read pairs [vg mpmap] elapsed time 11.6 h: Mapped 70000000 read pairs [vg mpmap] elapsed time 12.4 h: Mapped 75000000 read pairs [vg mpmap] elapsed time 12.9 h: Mapping finished. Mapped 77863987 read pairs.

The output looks similar to the previous gamp files generated.

jeizenga commented 5 days ago

Well, if this became truncated, it probably happened after vg mpmap, since it seems to have exited successfully. Would the handling after this have allowed truncation (e.g. downloading from a remote source)? Another possibility is that some extra output got mixed into/tacked onto the output. In any case, I suspect the error originates earlier in the pipeline than rpvg. One quick check would be to run vg filter -M -t <N_THREADS> alns.gamp > /dev/null to see if vg can read it.

CarlosAmadeo7 commented 3 days ago

Hello there! I tired to verify the integrity of the gamp files and I have this error when I tried to convert it into jason: e.g :vg view -a mpmap_05_treatment.gamp > /dev/null

The error is the following: /cm/local/apps/slurm/var/spool/job17529652/slurm_script: line 12: cd: /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads/Misha_reads: No such file or directory terminate called after throwing an instance of 'std::runtime_error' what(): [io::ProtobufIterator] tag "MGAM" for Protobuf that should be "GAM" ━━━━━━━━━━━━━━━━━━━━ Crash report for vg v1.57.0 "Franchini" Stack trace (most recent call last):

14 Object "/vg/bin/vg", at 0x5f4c5d, in _start

13 Object "/vg/bin/vg", at 0x1f6ae5f, in __libc_start_main

12 Object "/vg/bin/vg", at 0x5c497e, in main

11 Object "/vg/bin/vg", at 0xd73feb, in vg::subcommand::Subcommand::operator()(int, char**) const

10 Object "/vg/bin/vg", at 0xd831bb, in main_view(int, char**)

9 Object "/vg/bin/vg", at 0xf4c040, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::function<void (std::istream&)>)

8 Object "/vg/bin/vg", at 0xd7fdd1, in std::_Function_handler<void (std::istream&), main_view(int, char**)::{lambda(std::istream&)#9}>::_M_invoke(std::_Any_data const&, std::istream&)

7 Object "/vg/bin/vg", at 0xc2e370, in void vg::io::for_each(std::istream&, std::function<void (long, vg::Alignment&)> const&)

6 Object "/vg/bin/vg", at 0x64b262, in vg::io::ProtobufIterator::fill_value()

5 Object "/vg/bin/vg", at 0x1ea6ed8, in __cxa_throw

4 Object "/vg/bin/vg", at 0x1ea6d76, in std::terminate()

3 Object "/vg/bin/vg", at 0x1ea6d0b, in cxxabiv1::terminate(void (*)())

2 Object "/vg/bin/vg", at 0x5c150a, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]

1 Object "/vg/bin/vg", at 0x5c3ea7, in abort

0 Object "/vg/bin/vg", at 0x14e247b, in raise

ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Please include this entire error log in your bug report!

What surprises me is that I have the same error for all the 6 files and rpvg worked for the first 2 but not for all the rest ones. I checked the quality of the reads and they look good. One thing that I just realized is that these reads were filtered for quality control and the adapters were removed from them, before doing the mapping with vg mpmap I know that vg mpmap has a function for quality read control too. Do you think doing those extra steps before, making mapping the reads resulted in "nstance of 'std::runtime_error"? I would appreciate your feedback. Best