Open Tonitsk8264 opened 1 year ago
I split vcf, fasta and gtf according to chr, and followed the steps of https://github.com/jonassibbesen/vgrna-project-paper/tree/main/scripts/bash/contruct_pantranscriptome https://github.com/vgteam/vg/wiki/Index-Construction#complex-graph-with-many-paths to build the index step by step, specific command line The parameters are as follows.
CPU=4 OUT_PREFIX=pantranscriptome # 1.construct_graphs vg construct -p -t ${CPU} -a -v ${CHR}/${CHR}.vcf.gz -r ${CHR}/${CHR}.fasta vg convert -p ${CHR}/${CHR}.vg > ${CHR}/${CHR}.pg vg rna -p -t ${CPU} -e -n ${CHR}/${CHR}.gtf ${CHR}/${CHR}.pg # 2.join_graphs vg ids -j $(for i in $(seq 1 9; echo UN); do echo chr$i/chr$i.spliced.pg; done) -m empty_node_mapping.map # 3.project_transcripts vg index -p -t ${CPU} -G ${CHR}/haplotypes.gbwt -v ${CHR}/${CHR}.vcf.gz ${CHR}/${CHR}.spliced.pg vg rna -p -t ${CPU} -o -r -n ${CHR}/${CHR}.gtf -l ${CHR}/haplotypes.gbwt -b ${CHR}/${OUT_PREFIX}.gbwt -f ${CHR}/${OUT_PREFIX}.fa -i ${CHR}/${OUT_PREFIX}.txt ${CHR}/${CHR}.spliced.pg > ${CHR}/${OUT_PREFIX}_tmp.pg && mv ${CHR}/${OUT_PREFIX}_tmp.pg ${CHR}/${OUT_PREFIX}.pg # 4.generate_xg vg index -p -t 4 -b ./tmp/ -x ${OUT_PREFIX}.xg $(for i in $(seq 1 9; echo UN); do echo chr${i}/"${OUT_PREFIX}".pg; done) # 5.merge vg gbwt -p -m -o ${OUT_PREFIX}.gbwt $(for i in $(seq 1 9; echo UN); do echo chr${i}/${OUT_PREFIX}.gbwt; done) cat $(for i in $(seq 1 9; echo UN); do echo chr${i}/${OUT_PREFIX}.txt; done) | grep -v ^Name > ${OUT_PREFIX}.txt cat $(for i in $(seq 1 9; echo UN); do echo chr${i}/${OUT_PREFIX}.fa; done) > ${OUT_PREFIX}.fa # 6.generate_gcsa mv empty_node_mapping.map mapping for i in $(seq 1 9; echo UN); do vg prune -p -t ${CPU} -u -a -m mapping chr${i}/${OUT_PREFIX}.pg > chr${i}/${OUT_PREFIX}_pruned.vg ; done vg index -p -t ${CPU} -b ./tmp/ -g ${OUT_PREFIX}.gcsa -f mapping $(for i in $(seq 1 9; echo UN); do echo chr${i}/${OUT_PREFIX}_pruned.vg; done) # Generate distance index vg index -p -t ${CPU} -j ${OUT_PREFIX}.dist ${OUT_PREFIX}.xg vg mpmap -t 16 -x pantranscriptome.xg -g pantranscriptome.gcsa -d pantranscriptome.dist -f 2L-1/1.R1.fastq.gz -f 2L-1/1.R2.fastq.gz > 2L-1.gamp rpvg --graph pantranscriptome.xg --paths pantranscriptome.gbwt --alignments 2L-1.gamp --output-prefix rpvg --inference-model haplotype-transcripts --path-info pantranscriptome.txt --threads 4
However, after using the vg mpmap command, an error still occurs when using rpvg
rpvg: /psd/container/micromamba-0.25.1/envs/VG_v1.49.0/share/rpvg/src/main.cpp:317: spp::sparse_hash_map<std::__cxx11::basic_string<char>, PathInfo> parseHaplotypeTranscriptInfo(const string&, bool, bool): Assertion `haplotype_transcript_info_it.first->second.source_ids.emplace(haplotype_id_index_it.first->second).second' failed.
Looking forward to your response.
Which of these commands is the one that led to the crash? The warning suggests you may have started the pipeline with one version of VG and then switched at some point. Is that the case?
I split vcf, fasta and gtf according to chr, and followed the steps of https://github.com/jonassibbesen/vgrna-project-paper/tree/main/scripts/bash/contruct_pantranscriptome https://github.com/vgteam/vg/wiki/Index-Construction#complex-graph-with-many-paths to build the index step by step, specific command line The parameters are as follows.
However, after using the vg mpmap command, an error still occurs when using rpvg
Looking forward to your response.