haibol2016 / ArchR_utilities

1 stars 0 forks source link

Regarding get_geneID_symbol() #2

Closed hukai916 closed 2 years ago

hukai916 commented 2 years ago
  1. Is the following too strict? gtf <- gtf[gtf[, 3] == "gene", 9]

For example below, I have a GTF that looks like below. In column3, there is no "gene", but in column9, I still see "gene_name" there.

V1                    V2          V3    V4    V5 V6 V7 V8

1 chrM ncbiRefSeq.2020-03-20 transcript 14168 15308 . + . 2 chrM ncbiRefSeq.2020-03-20 exon 14168 15308 . + . 3 chrM ncbiRefSeq.2020-03-20 CDS 14168 15308 . + 0 4 chrM ncbiRefSeq.2020-03-20 start_codon 14168 14170 . + 0 5 chrM ncbiRefSeq.2020-03-20 transcript 13569 14093 . - . 6 chrM ncbiRefSeq.2020-03-20 exon 13569 14093 . - . V9 1 gene_id "CYTB"; transcript_id "YP_007316895.1"; gene_name "CYTB"; 2 gene_id "CYTB"; transcript_id "YP_007316895.1"; exon_number "1"; exon_id "YP_007316895.1.1"; gene_name "CYTB"; 3 gene_id "CYTB"; transcript_id "YP_007316895.1"; exon_number "1"; exon_id "YP_007316895.1.1"; gene_name "CYTB"; 4 gene_id "CYTB"; transcript_id "YP_007316895.1"; exon_number "1"; exon_id "YP_007316895.1.1"; gene_name "CYTB"; 5 gene_id "ND6"; transcript_id "YP_007316894.1"; gene_name "ND6"; 6 gene_id "ND6"; transcript_id "YP_007316894.1"; exon_number "1"; exon_id "YP_007316894.1.1"; gene_name "ND6";

haibol2016 commented 2 years ago

just to be consistent with forge TxDb, if there is not "gene" entries in the GTF file, makeTxDbFromGFF will complain, though you can get teh gene name here.