Open 08li20 opened 1 month ago
I think the error message here might actually be a bit misleading. Before, vg autoindex
makes the spliced graph, it splits the GFF into several chunks that it processes in parallel. My suspicion is that the line numbers being reported here correspond to the chunked GFF and not your input GFF. I think the best way to find the problematic lines might be to do the first couple of steps of the manual graph creation pipeline:
bgzip --threads 5 cattle.variants.vcf
tabix -p vcf cattle.variants.vcf.gz
vg construct --threads 5 -r Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa -v cattle.variants.vcf.gz > cattle.vg
vg rna --threads 5 -n cattlecattle.gff5 cattle.vg > cattle.spliced.vg
vg autoindex --threads 5 --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattlecattle.gff5 :ERROR: Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex)
I found that the exon on line 4 displayed in the error report is located on a pseudogene, and there is no transcript_id information. Is it because it is located on a pseudogene that causes the error? Do I need to delete it? sam.gff.txt