Open pclavell opened 2 months ago
attn: @jeizenga
Are you able to share the GTF that you were using? Even the first few hundred lines would probably be sufficient.
You can download it from this link (obtained from the gencode webpage): https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.chr_patch_hapl_scaff.annotation.gtf.gz
Hello,
I was wondering if you found a solution to this issue. I'm getting the same error and I tried multiple annotations, such as the Gencode one mentioned here, as well as annotations from ncbi and ucsc. https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/ https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/
In all cases the code crashes with the same error mentioned above
Hello, no, I couldn't solve it. I am waiting for the developers answer.
Hi, apologies for the delay--my union has been on strike and I'm only just returning to work. TLDR you can prepend GRCh38#0#
to the contig names in the GTF using sed
, and it should then run through.
The GFA you're pointing to stores the reference genome as a particular "sample" alongside other samples that have identifiers like HG0xxxx. The combination of a sample+haplotype+contig is specified using the PanSN naming specification, which look something like this:
GRCh38#0#chr1
The first field is the sample identifier (GRCh38
), the second is the haplotype (0
, which is somewhat redundant for references that don't have a diplotype), and the third is the contig (chr1
).
Hello, I am running
vg autoindex
to splice the minigraph-cactus full pangenome according to GENCODE v44 gene annotations in order to map RNA-seq reads. I have two questions: 1) By running the following command I receive a below shown error:vg autoindex \ --workflow mpmap \ --prefix data/00_autoindex/splicedpangenome \ --gfa /gpfs/projects/bsc83/Data/assemblies/pangenome/minigraph_cactus/hprc-v1.1-mc-grch38.full.gfa \ --tx-gff /gpfs/projects/bsc83/Data/gene_annotations/gencode/v44/modified/gencode.v44.chr_patch_hapl_scaff.annotation_chr2GRCh38#chr.gtf \ --tmp-dir temporary \ --threads 112 \ --verbosity 2
Error:Saving GBWT and GBWTGraph to temporary/vg-ikdYP8/dir-MgGI5j/d0cc1cf507d88bdebe898d1ba90127a241a83700.gbz [IndexRegistry]: Adding splice junctions to GBZ-format graph. ERROR: Chromosome path "chr1" not found in graph or haplotypes index (line 6).
When I first saw this I thought that it was the typical error where chromosomes are differently formatted (chr1 or 1) so I looked in the minigraph-cactus reference and found
SN:Z:GRCh38#chr1
so I changed the seqnames in the gene annotation from chr1 to GRCh38#chr1 but still I keep getting the same error. Which seqnames is this pangenome reference using?2) As GENCODE v44 annotation is built on GRCh38.p14 I am wondering if it is compatible with the minigraph-cactus pangenome references you built.
Thanks