Closed chenyangkang closed 1 year ago
Hi Yangkang,
correct, we merged Ensembl and RefSeq as the union of both had a higher completeness. The input annotation for chicken is already available: https://github.com/hillerlab/TOGA/tree/master/TOGAInput/chicken_galGal6
@MichaelHiller Thanks Dr. Hiller! That was helpful. What are these genes started with "reg_"? Did you name them or can I find these information anywhere?
Thanks!
Yangkang
Looks like we didn't have a gene symbol for those. The question is why. I'll ask.
I got an answer from Ekaterina. "These 9589 transcripts didn’t get a gene name because they were not annotated in the previous NCBI chicken annotation." that we produced in back in 2020/21. Therefore they just get an ID.
Looks like NCBI has now named many transcripts, so it could be worth updating the chicken annotation (compared to human / mouse, chicken has more room for improvement).
Ekaterina provided an updated file that used the current gene symbols to assign more transcripts a proper gene symbol. Please note that this is not the filtered transcript set (meaning it still has short intron, NMD and other not-proper transcripts), but maybe this is helpful for you as the transcriptID is the same. Pls see https://github.com/hillerlab/TOGA/tree/master/TOGAInput/chicken_galGal6/UnfilteredTranscriptsWithUpdatedGeneSymbols
As an alternative, you could use the latest NCBI refseq annotation https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/315/GCF_000002315.6_GRCg6a/ and produce a new input annotation.
@MichaelHiller Thanks! This is greatly helpful! Appreciated!
Hi! It's me again :)
I'm combining more species based on the 501 bird codon alignment, but I found difficulty utilizing the codons that you shared in the Science paper, because the naming of transcript/gene seems to follow GeneBank(?), like 'rna-XM_025141352.1.fa'. While some have gene symbol prefixed, some are not, and some are in ensembl format. It would be greatly helpful if you can share the chicken bed12 file you used for annotation so that we can know all the symbol of the genes.
Thanks in advance!
Yangkang