Closed sbwiecko closed 1 year ago
gget ref -w dna,gtf homo_sapiens
to get the links to these 2 files:
Thanks! This is because the gene_name field is empty for those transcripts in the latest ENSEMBL GTF annotation (though I've found the gene name AL513477.1 in some other GTFs). You could either simply use the ENSEMBL gene IDs (since none of those will be empty) in lieu of gene names or you can move the ENSEMBL gene IDs into the third column (the gene names column) of the t2g.txt file for any records where there is a missing gene name.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
After downloading a prebuild index as follow:
t2g.txt is 9.6Mb and the index file 3.0Gb, while building it de novo:
gives a 19Mb t2g.txt and 3.2Gb index files.
It looks like the de novo method generates a t2g.txt file with empty rows or transcprits with no correspondence to any gene:
And this lead to some issue in my downstream analysis while creating a Seurat object from a Read10X expression matrix, because there are empty rownames in the matrix.
Is there anything I did wrong ? How to get ride of the transcripts with no corresponding gene ?