vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

ERROR vg autoindex :Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex) #4292

Open 08li20 opened 1 month ago

08li20 commented 1 month ago

vg autoindex --threads 5 --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattlecattle.gff5 :ERROR: Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex)

I found that the exon on line 4 displayed in the error report is located on a pseudogene, and there is no transcript_id information. Is it because it is located on a pseudogene that causes the error? Do I need to delete it? sam.gff.txt

jeizenga commented 1 month ago

I think the error message here might actually be a bit misleading. Before, vg autoindex makes the spliced graph, it splits the GFF into several chunks that it processes in parallel. My suspicion is that the line numbers being reported here correspond to the chunked GFF and not your input GFF. I think the best way to find the problematic lines might be to do the first couple of steps of the manual graph creation pipeline:

bgzip --threads 5 cattle.variants.vcf
tabix -p vcf cattle.variants.vcf.gz
vg construct --threads 5 -r Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa -v cattle.variants.vcf.gz > cattle.vg
vg rna --threads 5 -n cattlecattle.gff5 cattle.vg  > cattle.spliced.vg