jonassibbesen / hprc-rnaseq-analyses-scripts

Scripts used for the RNA-seq analyses in the HPRC draft pangenome paper.
MIT License
9 stars 0 forks source link

The problem of constructing pan transcriptome using GRCh38 Alts Graph #4

Open xuxingyubio opened 1 year ago

xuxingyubio commented 1 year ago

I'm very sorry to bother you

Recently, I choosed alt contigs on a chromosome to construct a pangenome.

When I used a gbz file constructed by minigraph cactus, gencode. v38. primary.gff, and alt.annotation. gff3 as input,the following error occurred: [vg rna] Adding transcript splice-junctions and exon boundaries to graph ... ERROR: Chromomsome path "chr1" not found in graph or haplotypes index (line 8).

How can I deal this problem?

jonassibbesen commented 1 year ago

Hi, thanks for writing. Sorry for the delayed response. Could you share the command-line you used? The error happens because the chromosome "chr1" (found in one of the annotations) is not present as a path in the graph or it has another name in the graph.

xuxingyubio commented 1 year ago

I use 'vg paths -Lx ' to see all walks in your GFA. The output of vg paths is strange. The format of the output is like this: SAMPLE#HAPLTOYPE#CONTIG#0.I used 'cactus-pangenome' directly to construct the graph followed by ’https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md#grch38-alts-graph‘.

Recently, I encountered a new problem, how do you handle transcripts that span multiple chromosomes when processing gff files?

xuxingyubio commented 1 year ago

The error is like this:

[vg rna] Parsing graph file ... [vg rna] Converting graph format ... [vg rna] Graph and GBWT index parsed in 460.391 seconds, 8.89319 GB [vg rna] Adding transcript splice-junctions and exon boundaries to graph ... vg: src/transcriptome.cpp:718: int32_t vg::Transcriptome::parse_transcripts(std::vector, uint32_t, std::istream*, const bdsg::PositionOverlay&, const gbwt::GBWT&, bool) const: Assertion `transcript->chrom == chrom' failed. ━━━━━━━━━━━━━━━━━━━━ Crash report for vg v1.48.0 "Gallipoli" Stack trace (most recent call last):

10 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x5f0e3d, in _start

9 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x1ed71af, in __libc_start_main

8 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x5c0ade, in main

7 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0xd3143b, in vg::subcommand::Subcommand::operator()(int, char**) const

6 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0xc2d82a, in main_rna(int, char**)

5 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0xe01ae3, in vg::Transcriptome::add_reference_transcripts(std::vector<std::istream, std::allocator<std::istream> >, std::unique_ptr<gbwt::GBWT, std::default_delete >&, bool, bool)

4 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0xdf5f44, in vg::Transcriptome::parse_transcripts(std::vector<vg::Transcript, std::allocator >, unsigned int, std::istream*, bdsg::PositionOverlay const&, gbwt::GBWT const&, bool) const [clone .constprop.0]

3 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x1ee7555, in __assert_fail

2 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x5bfed7, in __assert_fail_base.cold

1 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x5c0007, in abort

0 Object "/home/users/xyxu/pantools/cactus-bin-v2.5.2/bin/vg", at 0x149611b, in raise

ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug. Please include this entire error log in your bug report!

xuxingyubio commented 1 year ago

When pangenome is used for transcriptome analysis, I want to know which of the graph you choose, full graph, clip graph, or filter graph?

jonassibbesen commented 1 year ago

Recently, I encountered a new problem, how do you handle transcripts that span multiple chromosomes when processing gff files?

vg rna does not work for transcripts that span multiple chromosomes. This also the reason for why you see this error:

vg: src/transcriptome.cpp:718: int32_t vg::Transcriptome::parse_transcripts(std::vectorvg::Transcript, uint32_t, std::istream*, const bdsg::PositionOverlay&, const gbwt::GBWT&, bool) const: Assertion `transcript->chrom == chrom' failed.

jonassibbesen commented 1 year ago

When pangenome is used for transcriptome analysis, I want to know which of the graph you choose, full graph, clip graph, or filter graph?

For the HPRC RNA-seq analysis we used a filtered graph.