Open CarlosAmadeo7 opened 1 week ago
It's possibly a truncated file. Can you share how you made the GAMP?
Sure It is the same line of code I used to generate my previous 2 gamp files:
singularity exec -B /work/ /work/alfaroqc/apps/vg_v1.57.0.sif vg mpmap -t 32 -x $xg_path -g $gcsa_path -d $dist_path -f $read_1_3 -f $read_2_3 > mpmap_03_control.gamp
where xg_path, gcsa_path, and dist_path are where the files are located, as well as the paired-end reads: read_1_3 and read_2_3
The output I obtained is this one:
[vg mpmap] elapsed time 0 s: Executing command: /vg/bin/vg mpmap -t 32 -x /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.xg -g /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa -d /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/graph.dist -f /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R1.fq -f /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R2.fq
[vg mpmap] elapsed time 0 s: Loading graph from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.xg
[vg mpmap] elapsed time 4 s: Completed loading graph
[vg mpmap] elapsed time 4 s: Graph is in XG format. XG is a good graph format for most mapping use cases. PackedGraph may be selected if memory usage is too high. See vg convert
if you want to change graph formats.
[vg mpmap] elapsed time 4 s: Identifying reference paths
[vg mpmap] elapsed time 5 s: Loading GCSA2 from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa
[vg mpmap] elapsed time 5 s: Loading distance index from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/graph.dist (in background)
[vg mpmap] elapsed time 8 s: Completed loading distance index
[vg mpmap] elapsed time 9 s: Completed loading GCSA2
[vg mpmap] elapsed time 9 s: Loading LCP from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/minigraph_cactus_grch38.gcsa.lcp
[vg mpmap] elapsed time 9 s: Memoizing GCSA2 queries (in background)
[vg mpmap] elapsed time 12 s: Completed loading LCP
[vg mpmap] elapsed time 13 s: Completed memoizing GCSA2 queries
[vg mpmap] elapsed time 13 s: Building null model to calibrate mismapping detection
[vg mpmap] elapsed time 15 s: Mapping reads from /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R1.fq and /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads_RNA_seq/Misha_reads/S03_R2.fq using 32 threads
[vg mpmap] elapsed time 50.4 m: Mapped 5000000 read pairs
[vg mpmap] elapsed time 1.7 h: Mapped 10000000 read pairs
[vg mpmap] elapsed time 2.5 h: Mapped 15000000 read pairs
[vg mpmap] elapsed time 3.3 h: Mapped 20000000 read pairs
[vg mpmap] elapsed time 4.1 h: Mapped 25000000 read pairs
[vg mpmap] elapsed time 5.0 h: Mapped 30000000 read pairs
[vg mpmap] elapsed time 5.8 h: Mapped 35000000 read pairs
[vg mpmap] elapsed time 6.6 h: Mapped 40000000 read pairs
[vg mpmap] elapsed time 7.4 h: Mapped 45000000 read pairs
[vg mpmap] elapsed time 8.3 h: Mapped 50000000 read pairs
[vg mpmap] elapsed time 9.1 h: Mapped 55000000 read pairs
[vg mpmap] elapsed time 9.9 h: Mapped 60000000 read pairs
[vg mpmap] elapsed time 10.8 h: Mapped 65000000 read pairs
[vg mpmap] elapsed time 11.6 h: Mapped 70000000 read pairs
[vg mpmap] elapsed time 12.4 h: Mapped 75000000 read pairs
[vg mpmap] elapsed time 12.9 h: Mapping finished. Mapped 77863987 read pairs.
The output looks similar to the previous gamp files generated.
Well, if this became truncated, it probably happened after vg mpmap
, since it seems to have exited successfully. Would the handling after this have allowed truncation (e.g. downloading from a remote source)? Another possibility is that some extra output got mixed into/tacked onto the output. In any case, I suspect the error originates earlier in the pipeline than rpvg
. One quick check would be to run vg filter -M -t <N_THREADS> alns.gamp > /dev/null
to see if vg
can read it.
Hello there! I tired to verify the integrity of the gamp files and I have this error when I tried to convert it into jason: e.g :vg view -a mpmap_05_treatment.gamp > /dev/null
The error is the following: /cm/local/apps/slurm/var/spool/job17529652/slurm_script: line 12: cd: /work/alfaroqc/Pangenome_project/Pantranscriptome_files/Minigraph_cactus/Testing_reads/Misha_reads: No such file or directory terminate called after throwing an instance of 'std::runtime_error' what(): [io::ProtobufIterator] tag "MGAM" for Protobuf that should be "GAM" ━━━━━━━━━━━━━━━━━━━━ Crash report for vg v1.57.0 "Franchini" Stack trace (most recent call last):
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Please include this entire error log in your bug report!
What surprises me is that I have the same error for all the 6 files and rpvg worked for the first 2 but not for all the rest ones. I checked the quality of the reads and they look good. One thing that I just realized is that these reads were filtered for quality control and the adapters were removed from them, before doing the mapping with vg mpmap I know that vg mpmap has a function for quality read control too. Do you think doing those extra steps before, making mapping the reads resulted in "nstance of 'std::runtime_error"? I would appreciate your feedback. Best
Hello rpvg team: I've successfully gotten the .gamp file of the transcriptome file with vg mpmap and there was no problem at all. But when I run rpvg, I have this error:
Running rpvg (commit: cd5160deb1a75d745c7ba98dea634c49ccd296b5) Random number generator seed: 1730236892 Fragment length distribution parameters found in alignment (mean: 151.096, standard deviation: 43.1828) Loaded graph, GBWT and r-index (6.607 seconds, 10.2174 GB) [E::bgzf_uncompress] Inflate operation failed: invalid distance too far back terminate called after throwing an instance of 'std::runtime_error' what(): [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 47907863952 group 41477367607 /cm/local/apps/slurm/var/spool/job17498325/slurm_script: line 25: 39401 Aborted
What is weird is that I ran rpvg previously with two different gamp files, and they ran okay, but this one is not working properly.
The command I am using is this one:
singularity exec -B /work /work/public/singularity/rpvg_latest.sif rpvg -t 32 -g $xg_path -p $gwbt_path -f $txt_gz_path -a mpmap_03_control.gamp -o rpvg --inference-model transcripts
where xg_path, gwbt_path, and txt_gz_path are where the files are located. I used the same command to run rpvg before but using different gamp files and it was ok. I would appreciate any help Best