jonassibbesen / vgrna-project-paper

Bash scripts and data used in pantranscriptomic paper
MIT License
22 stars 3 forks source link

mpmap error when mapping HLA-haplotype of AMR population from 1000 genome project (1kGP) #4

Open vmgiang opened 1 year ago

vmgiang commented 1 year ago

Hi Jonassibbesen !

When I used adding_haplotype with AMR population of 1000 genome project, i got this error:

"[vg mpmap] elapsed time 0 s: Executing command: vg mpmap -t 16 -l long -F GAM -x ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100.xg -g ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.gcsa -d ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.dist -f ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA_imgt_hla_main/1kg_AMR_af01_HLA_gencode100_imgt_hla_main_6_haps.fa [vg mpmap] elapsed time 0 s: Loading graph from ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100.xg [vg mpmap] elapsed time 0 s: Completed loading graph [vg mpmap] elapsed time 0 s: Graph is in XG format. XG is a good graph format for most mapping use cases. PackedGraph may be selected if memory usage is too high. See vg convert if you want to change graph formats. [vg mpmap] elapsed time 0 s: Identifying reference paths [vg mpmap] elapsed time 0 s: Loading GCSA2 from ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.gcsa [vg mpmap] elapsed time 0 s: Loading distance index from ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.dist (in background) [vg mpmap] elapsed time 0 s: Completed loading GCSA2 [vg mpmap] elapsed time 0 s: Loading LCP from ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.gcsa.lcp [vg mpmap] elapsed time 0 s: Memoizing GCSA2 queries (in background) [vg mpmap] elapsed time 1 s: Completed loading LCP [vg mpmap] elapsed time 4 s: Completed loading distance index [vg mpmap] elapsed time 8 s: Completed memoizing GCSA2 queries [vg mpmap] elapsed time 8 s: Building null model to calibrate mismapping detection [vg mpmap] elapsed time 12 s: Mapping reads from ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA_imgt_hla_main/1kg_AMR_af01_HLA_gencode100_imgt_hla_main_6_haps.fa using 16 threads stack smashing detected : terminated ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Stack trace path: /tmp/vg_crash_dqVoVS/stacktrace.txt Please include the stack trace file in your bug report! Command exited with non-zero status 134 Command being timed: "bash -c vg mpmap -t 16 -l long -F GAM -x ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100.xg -g ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.gcsa -d ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA/1kg_AMR_af01_HLA_gencode100_index.dist -f ~/rna_graph/onlyHLA_regions/HLA_region/code/hla_test/hla_test.graph2/AMR_af01_6_HLA_imgt_hla_main/1kg_AMR_af01_HLA_gencode100_imgt_hla_main_6_haps.fa > tmp_1kg_AMR_af01_HLA_gencode100_imgt_hla_main/haps.gam" " Can you help to solve this problem?

jonassibbesen commented 1 year ago

@jeizenga, do you know why this is happening?

jeizenga commented 1 year ago

@Mgiang1305 Most likely a vg bug. Can you be more specific about what commands you ran and where you got this data? It would be ideal if you can send me the input data so I can reproduce this behavior locally.

vmgiang commented 1 year ago

Hi @jeizenga , I extracted 347 samples AMR from 1000 genome project to build pantranscriptome. Data and scipts have been uploaded to this drive link: https://drive.google.com/drive/folders/1koNnWGCq_bmk30q-r2CFUkhx47Uxy80T?usp=sharing

jeizenga commented 1 year ago

Great, thanks. Another thing I forgot: could you copy and paste your executable's output from vg version?

vmgiang commented 1 year ago

I build graph with vg version v1.41.0 "Salmour".

vg version v1.41.0 "Salmour" Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux Linked against libstd++ 20210601 Built by stephen@lubuntu

@jeizenga Thank you !!!

jeizenga commented 1 year ago

Hi, sorry for the delay in getting around to this. I'm looking at now. I think I was not specific enough when I asked for your data. The data I would need are the inputs to the add_haplotype.sh script that you encountered the bug on. It looks to me like you provided all the raw data instead.