jonassibbesen / rpvg

Method for inferring path posterior probabilities and abundances from pangenome graph read alignments
MIT License
47 stars 6 forks source link

Unable to obtain GWBT #46

Closed ld9866 closed 1 year ago

ld9866 commented 1 year ago

We built a pan-genome using minigraph-cactus, then converted to vg format and built index files, and everything was fine. However, I don't have your pantranscriptom.gbwt, pantranscriptom.gbwt. ri and pantranscriptom.txt.gz in the sample file. I noticed that in your introduction, "The pantranscriptome paths should be compressed and indexed using the GBWT." However, since I was a novice and not familiar with vg, I tried it as requested, but it was clear that there was still a problem. I implemented it successfully "Construct and index spliced pangenome graph" and "Map simulated RNA-seq reads using vg mpmap". Best yours,

Code: vg gbwt -x vg_rna.spliced.xg -o vg_rna.spliced.gbwt --vcf-input primates-pg.vcf.gz --num-threads 10 Result: warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 551c0d409cc2f3ef48b92e117f9f85b4a3dcbe4a at 1:475 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 11cdf7347abe3bc6f33e28762d9974805ea74508 at 1:477 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 4e18f5dfcc04b0f4cff5527391c48286331c6b03 at 1:480 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 0c9923fbf782e648cd1511bad9a8f116ee59727a at 1:483 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 6a19a010c3c8901cf9c52dfa93725cc33f736ed9 at 1:493 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 0de12e70ee148d8f93190a4f2995ca77ee8d359a at 1:509 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 851c9a881a775c4361e3a8d1d46a1702b2c86cd1 at 1:513 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5de70cd73e7add41c65c83a32b7891e46920033e at 1:515 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 890c409bd790cb66ac2f85528a87ec6ff9b1e536 at 1:516 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 47a748b295f389bfe7052a394ceab94e85d8ce5c at 1:527 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3198510/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1cc6de4c3346fd58150ca3494b083634e9a2c57a at 9:27 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 6ce79784e3c8e81cff8ecf8a4cbea36dcb094452 at 9:33 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for e731c4a2cf7266bcc0a733e15495374a45e9f9b9 at 9:35 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for d69e78e7a7d7bb4c59ad3765ed503eeb515975ae at 9:39 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ad6a0251fe8b99d93ffd593941d343bbc295e38b at 9:42 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ff6e9609242d00346f9cfdd4a74cf48050376f40 at 9:45 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for fee027d669712b9a83e2c4cb7f23e140f6c26b68 at 9:48 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 183024f4a602aac9f9b0e3cd2170fa6f8b0d1888 at 9:51 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1e7c415d80577aa2e5cf56a903af99c33079bc6c at 9:53 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for eea97e3450e3e2ace41b1835006ce0805d14fa59 at 9:56 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3736445/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 2d5ebf3c355a29bfc1396fe04b0b2f4375b2890a at 6:10 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 260a460e2e81999d99d3755c89883faa55b4e224 at 6:147 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 99b990181e5492025f5622a2d756d02202836600 at 6:151 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 8c30ffac0a85efccd9a8fd401fab64def0eb1272 at 6:154 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for aa58be24542f897886dd0563ffdf641cece3da6e at 6:172 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 03c124788cf35f2eb47c05bfb1c51e3400cb5d21 at 6:187 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for e23cccd66cd3be84e222619bc1c1d790025534ec at 6:193 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1aa92651d0fa7269556ed0b48902e7f695d7688d at 6:211 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 228c0af5369e7a1f35dc46dc93470596a1200346 at 6:250 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 51470b7acb41c982f23aab20d84f998340a4f019 at 6:277 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3768320/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5cc06dcbe07651acac02f1ffa866ae655a8c63ce at 8:124 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 26cbe1718a45847e8bafaeca91e1f08434ed5fec at 8:149 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1746804253d49900d651432044ff960b7b75d0d7 at 8:151 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for f2e84cd68b31b3c456a80115c94d26f3252cbd3a at 8:697 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ff9ba283c94b7f00d62422dd9aac691f97a44589 at 8:702 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for b75868657d54a26e4d5434cf0f4d45426cfb6515 at 8:705 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 6f62e6c6342aee0d0204637fb29c1b4e21353728 at 8:707 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bf0d5ddfa07dc44c3a32f1a34bc22da8edd84817 at 8:714 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for cb6a4b8da090be595543e7d095bb90afffc5d36b at 8:718 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for d01eca693390f367a953c7689ad0c09ffcb30be8 at 8:722 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 2290678/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 588f8dddc1ebbd1458f34fe0b606c257304f0a2c at 14:1 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 476d7cb45ec56ebca462c452ef8d1a781ae1d28c at 14:5 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bff7ac1a5d594a67d5985f1114108358b931570a at 14:19 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for e0ed32a990b64edabee53c259838cdb9343dba67 at 14:22 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ead57a19db4fed5bcd4052bdf49df9566453c0e2 at 14:23 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 9498ff051a7651cf922eb19f04d5acaf77c21d68 at 14:421 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 8c50f953a821f30f9d110e3c370f52493003d6a0 at 14:428 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for b4deea24d7ad1f096313212f5db2c62a064176ce at 14:437 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for b42013329ff7c7279f1d504aaaa0b57f83c90e25 at 14:4794 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1bb541a8f92e61280394d7a0e32efe12cc314e06 at 14:4810 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3358433/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ee3d56668c926850a7fee59ef244a7b08ae9f849 at 15:17 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for f843aaa7442a3560845c2ba2cca61e5d65518403 at 15:31 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 49c9946258d4fe49225ef6dcba7bec723d9ed10f at 15:34 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 53fe6f158161739032d924c47d7983ae7e71b3c1 at 15:38 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ef32260df1c02a7168aaa2d8eec8914f50c18852 at 15:41 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5dc8ed4644ec17321e11120048d6fc617bb42b8e at 15:43 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 808d9a7e08829019de7c96ab49eac9121b5916cc at 15:46 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5fb333f44f0034e4a88066d63f1ca8a16f7c6f26 at 15:51 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 7b3f9bfec08e14be3ef809fabe399630f209c47f at 15:53 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 7f5d0b91d40a26639c60770cc95fb1f55efa01c3 at 15:63 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3210329/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 9a79f3f9ea6e382cb4bf1a8cedfdf336f280a888 at 3:26 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5e4c3eb6104a24b6589043cb53a7a9244168922b at 3:66 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 47a0950061bf75092e46316540bc37004ff46311 at 3:68 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 4285799aa36b23f390c7398147b8ed49c0cc9bc9 at 3:70 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bf6b284589f60f653f4e5aea6542bd3b520e1623 at 3:82 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for a18eb3f563a5feca70497c02a9852d2d1e0e3fc0 at 3:117 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for a8447d3a85f1de57baf92d38d065269d0cf12544 at 3:146 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for e255d2394748ed777f85789b2a7eb53183dd0e40 at 3:171 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for dfcc5a32810104cc53e7e1cac548372bcdccaf03 at 3:184 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for dc1ca07ff857e8906f6e1a0279bd43557852699e at 3:204 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3122888/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for cc4e7aaba463d824a0a0f17a3fe27b0c052208b9 at 4:16 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 729e8d90e50c3972d26762bc06553132e42f6bc0 at 4:60 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 08be2aae62b737742045206a9fd30bf589a319f7 at 4:61 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for ee16db5d5856aca7280ec2fb6ee5c085396f09ec at 4:111 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 4c75338fc89c199359c55a1ebe0976bc9387cabb at 4:121 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for fa117e65f307325433927bba2ff4d0214970e40e at 4:123 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 51b808a1c9ff7e730c8e0a7daabb1c68092c8b65 at 4:128 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 6f33aaf378382c39236b29ed1f2aade432cb8020 at 4:133 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for b188716644365ff1eba648b760acb8a460a4fba8 at 4:138 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for acb33d75d4e4db6656310a4a8eb93cda284f0355 at 4:191 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 3015392/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 9e7d19f0b2ab6ca1da80ff0b5c2c0bfc39f2ad9b at 13:13 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 9d6a8ecd8c0408e9a7b60c1b72705d9db43ec783 at 13:46 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 94e9be8eb64025f3196cb4689a637ec772e21184 at 13:57 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for efd4c75862fdbd672be9154801787ef48ed236ae at 13:62 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 6db3dc20a94e2beb379b94132e2dfc13fe3f5b1c at 13:74 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 37866e67a16b718130730b80e32aae7a47b38219 at 13:80 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for f9ae502bbd82741003fde35cd3f1414e5bc6fa59 at 13:96 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bdbe762f47e220d4c39db48974fcb7293a3998f0 at 13:103 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for b7528c1ca42c559f6a05b1edea53597dd769096b at 13:117 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for cb2b5ca77dcdd0ec44da1c42a9660705cf58d158 at 13:133 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 2551281/0 variants in phasing VCF but not in graph! Do your graph and VCF match? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 132760e5fd75745fb988c6df867613cec821567f at 7:5 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for a0ddc661961331b24d0e5db7610e3bac68ba1606 at 7:7 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 6ae3d99e55cb1626a49e5693ac87847971ce41e3 at 7:32 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for c300ba005cb428d085ed01dc339c6ce2afbb53ec at 7:99 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for c1298bc42f7f3160412cb249229d33d5b12c9d5b at 7:725 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5933fa9657ece989b16071ac1222e425a1cfa0de at 7:761 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1c50b8e3e107ba652560f9c6b28a9a31d20661ff at 7:852 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 1e62fa762beedd86c43a323b0966df8c123d33e3 at 7:965 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5f2142f7b9aa069fb29e7e45fd99dee64e7a3120 at 7:1282 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for eb631a0aec70e056b21207806ea9b3cdd338aa3a at 7:1647 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 2824500/0 variants in phasing VCF but not in graph! Do your graph and VCF match?

ld9866 commented 1 year ago

l used the code "vg autoindex --workflow mpmap -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" which works ok and vg mpmap works well. However, l found that l can not have the pantranscriptome.gbwt and pantranscriptome.txt.gz in your example file. l also print the code "vg autoindex --workflow mpmap --workflow rpvg -t 20 --prefix vg_rna_gfa2 --ref-fasta ref.fa --vcf primates-pg.vcf.gz --tx-gff ref.gtf" but it showed that [IndexRegistry]: Checking for haplotype lines in GFA. error:[vg autoindex] Input is not sufficient to create indexes Inputs GTF/GFF Reference FASTA Reference GFA w/ Haplotypes are insufficient to create target index Haplotype-Transcript GBWT

jonassibbesen commented 1 year ago

Hi, since your graph is from the minigraph-cactus pipeline you do not need to construct the haplotypes separately using vg gbwt. This is only needed if you had constructed your graph from a VCF file. I assume you have a GFA file of the graph from the minigraph-cactus pipeline? The GFA file contains both the graph and the assembled haplotypes that were used to construct the graph. The GFA file can be converted to a GBZ file using vg gbwt. This file contains both the graph and the haplotypes as a GBWT index, and can be used as input to vg rna to construct a spliced pangenome and pantranscriptome:

vg gbwt -p --gbz-format -g graph.gbz -G graph.gfa
vg rna -p --gbz-format -n ref.gtf -b pantranscriptome.gbwt -i pantranscriptome.txt graph.gbz > spliced_graph.pg

By default vg rna assumes the contigs/chromosomes (column 1) in your transcript annotation file are reference paths in the graph. If they are instead haplotypes you should add the option -j to the vg rna command. I have recently updated the Transcriptomic analyses wiki on the vg GitHub with more information on how to use vg rna (https://github.com/vgteam/vg/wiki/Transcriptomic-analyses). The GBZ functionality in vg rna was implemented recently so you might need to pull and recompile in order to use it.

ld9866 commented 1 year ago

Thank you for your reply. l have tried pulling and recompiling in order to use the newest vg rna. However, when the code is "git clone https://github.com/vgteam/vg.git", l found that nothing in the "deps" and I find that many are in the form of soft links. How can I solve this?

jonassibbesen commented 1 year ago

You need to clone with the --recursive option: git clone --recursive https://github.com/vgteam/vg.git

See guide on how to build vg on Linux or Mac here: https://github.com/vgteam/vg#building-on-linux

ld9866 commented 1 year ago

Thank you for your timely reply. However, due to the instability of git network, it has been very difficult for us to pull every content to our local. Could you please send your source code to my email? Be grateful! email: tanxingdemogu9866@gmail.com Best wishes

ld9866 commented 1 year ago

Thank you for your reply. However, we found that when we executed the command to generate gbwt Times, it was wrong. Do you know the cause of this? Here, l uplead the first 10 lines gtf "example.gtf". Best yours,

Code: vg rna -p --gbz-format -n ref.gtf -b pantranscriptome.gbwt -i pantranscriptome.txt graph.gbz > spliced_graph.pg Result: [vg rna] Parsing graph file ... [vg rna] Converting graph format ... [vg rna] Graph and GBWT index parsed in 423.976 seconds, 15.2574 GB [vg rna] Adding transcript splice-junctions and exon boundaries to graph ... ERROR: Chromomsome path "1" not found in graph or haplotypes index (line 6).

example.zip

ld9866 commented 1 year ago

I also tested the gtf file after sorting the gtf, and the error showed that: ERROR: Chromomsome path "1" not found in graph or haplotypes index (line 4). example.sort.zip

jonassibbesen commented 1 year ago

Hi, this error happens because the chromosome "1" is not present as a path in the graph or it has another name in the graph. Could you try running: vg path -L -x graph.gbz and share the output? This command will show all the haplotype paths in the GBZ file. Thanks!

ld9866 commented 1 year ago

Dear developer: Thank you for your help! I used the following command to output our result. "vg paths -L -x graph.gbz > gbz.list" I am not very familiar with this result, but due to privacy reasons, I choose to send it to your email, please check. The email address is: tanxingdemogu9866@gmail.com Best yours,

ld9866 commented 1 year ago

@jonassibbesen Hello developers! When would you be free to help us deal with the problem?

jonassibbesen commented 1 year ago

I got the email, thanks! I do not have time to look at it right now, but should have time later in the week.

ld9866 commented 1 year ago

OK! Looking forward to your reply! Best yours,

ld9866 commented 1 year ago

Hello developers! I recently tested it again and found that the pggb results worked perfectly. Best yours,

[vg rna] Parsing graph file ... [vg rna] Converting graph format ... [vg rna] Graph and GBWT index parsed in 0.319122 seconds, 0.0339088 GB [vg rna] Adding transcript splice-junctions and exon boundaries to graph ... Parsed 27 transcripts Constructed 27 reference transcript paths Updated graph with reference transcript paths [vg rna] Transcripts parsed and graph updated in 0.656304 seconds, 0.525043 GB [vg rna] Projecting transcripts to haplotypes ... Parsed 27 transcripts Projected 150 haplotype-specific transcript paths [vg rna] Haplotype-specific transcripts constructed in 0.557514 seconds, 0.525043 GB [vg rna] Topological sorting graph and compacting node ids ... Sorted 41251 nodes [vg rna] Graph sorted and compacted in 0.0520552 seconds, 0.525043 GB [vg rna] Writing pantranscriptome transcripts to file(s) ... [vg rna] Writing splicing graph to stdout ... [vg rna] Graph and pantranscriptome written in 0.197702 seconds, 0.525043 GB