Closed 08li20 closed 2 months ago
It looks to me like the entire GFF line is being parsed as its contig name. Maybe the GFF you have is space-separated instead of tab-separated?
After converting the delimiter character in the gff file to tab character, an error message is still reported saying that it cannot be found in the reference sequence. vg autoindex --threads 2 --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattle.gff error:[IndexRegistry] contig 1 from GTF/GFF cattle.gff is not found in reference
Will the lack of comment lines in the gff file affect the matching?
No, it would not. My guess is that it's most likely a "1" vs "chr1" mismatch. If not that, then some other mismatched representation. You can get a quick look at the sequence names in the FASTA with grep ">" ref.fa
, which will probably make the source of the error obvious.
I changed the chromosome names in the reference genome and the chromosome names in the gff file to be the same, but the same error still occurred. vg autoindex --threads 2 --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattle.gff2 error:[IndexRegistry] contig >1 from GTF/GFF cattle.gff2 is not found in reference
The GFF now appears to have the >
from the FASTA name line inserted into the contig name, so I think this was probably a move in the wrong direction. Can you copy the output of these commands? Then I can probably be more specific.
grep ">" Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa | head
and
head cattle.gff2
After I modified the gff file, I ran the following command, but ERROR: Tag "transcript_id" not found in attributes (line 145).ERROR: Tag "transcript_id" not found in attributes (line 4).ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex error, here is how to change the -f parameter in vg autoindex specifically vg autoindex --threads 5 --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattlecattle.gff5
ERROR: Tag "transcript_id" not found in attributes (line 145). ERROR: Tag "transcript_id" not found in attributes (line 4). ERROR: No transcripts parsed (remember to set feature type "-y" in vg rna or "-f" in vg autoindex)
Typically, a GFF file will include a unique identifier for each transcript as annotations in column 9. Often it's an accession number from a public database. Different genome annotation projects use different labels for the identifier, so you have to specify which one is the unique identifier using the --gff-tx-tag
argument. The default transcript_id
is what's used by GENCODE, but you'll have to figure out what the label is in your data set.
The command reported an error saying that transcript_id was not detected in line 145 of the gff file, but line 145 of my file was commented as CDS, and I added the parameter --gff-feature exon to only recognize the exon in the third line of the gff file. vg autoindex --threads 5 --gff-feature exon --gff-tx-tag transcript_id --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattlecattle.gff5 ERROR: Tag "transcript_id" not found in attributes (line 145).
Can you send the GFF file you're using?
sam.gff.docx This is the content of the first two hundred lines of the gff file
Closing here since you have opened the same thing as a separate issue at https://github.com/vgteam/vg/issues/4264
Run the command as follows:vg autoindex --threads 2 --workflow mpmap --prefix cattle --ref-fasta Cattle_ARS-UCD2.0_GCF_002263795.3_rename.fa --vcf cattle.variants.vcf --tx-gff cattle.trans.gff
error:[IndexRegistry] contig 1 RefSeq region 1 158534110 . + . ID=NC_037328.1:1..158534110;Dbxref=taxon:9913;Name=1;breed=Hereford;chromosome=1;gbkey=Src;genome=chromosome;isolate=L1;Dominette;01449;registration;number;42190680;mol_type=genomic;DNA;sex=female;tissue-type=left;lung from GTF/GFF cattle.trans.gff is not found in reference