lh3 / pangene

Constructing a pangenome gene graph
168 stars 8 forks source link

WARNING: skip ##### and failing to pring lines for proteins.faa output #10

Open joanocha opened 1 month ago

joanocha commented 1 month ago

Hello, I am trying to run the command that prepares the protein fasta as follows:

k8 pangene.js getaa gene-anno.gtf protein-seq.faa > proteins.faa

And I am getting an empty proteins.faa and the error "WARNING: skip "XP_531603.2"

This happens to be the "protein_id" field in my gtf file. I looked into pangene.js and my file has all the fields one would expect in a gtf. The only exception is that there is no "gene_name", only "gene". Would you mind sharing with me the inputs gene-anno.gtf, protein-seq.faa and output proteins.faa?

Thank you!

lh3 commented 1 month ago

The script works with gencode and ensembl annotations. It may not work for others. Extracted human proteins are available from zenodo: https://zenodo.org/records/10703141

joanocha commented 1 month ago

Thank you for the prompt reply and for this amazing tool. I ended up customizing pg_cmd_getaa function and now it works for my gtf as expected.