marbl / CHM13

The complete sequence of a human genome
Other
882 stars 96 forks source link

Issue with preparing TxDB and GRanges object from GFF file #85

Open nicodemus88 opened 11 months ago

nicodemus88 commented 11 months ago

Hi, I am having some trouble making a gene annotation object (TxDB or GRanges) for CHM13 from the GFF file. When making the TxDB object, I got the following warning message:

> txdb <- getTxDb(organism = "Homo sapiens", file = "./ref_genome/chm13v2.0_RefSeq_Liftoff_v5.1.gff3")
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning messages:
1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  some transcripts have no "transcript_id" attribute ==> their name ("tx_name" column
  in the TxDb object) was set to NA
2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID,  :
  the transcript names ("tx_name" column in the TxDb object) imported from the
  "transcript_id" attribute are not unique
3: In .find_exon_cds(exons, cds) :
  The following transcripts have exons that contain more than one CDS (only the first
  CDS was kept for each exon): NM_001134939.1, NM_001172437.2, NM_001184961.1,
  NM_001301020.1, NM_001301302.1, NM_001301371.1, NM_002537.3, NM_004152.3,
  NM_015068.3, NM_016178.2

Is this normal? I also noticed that there are quite a lot of NAs in the table and was having difficulties converting it to a GRanges object.

I think the annotation would be very useful to everyone else, so would it be possible for your team to host a TxDB / GRanges object of the annotation here?

Thank you very much.

njaupan commented 8 months ago

Hi, any update? I am looking for the same TxDB and GRanges files for the T2T genome.

chrarnold commented 5 months ago

I also run into this, can someone please comment on this? I wonder whether it is related to the known issue with the gff3 file sometimes missing a "parent" entry

arangrhie commented 5 months ago

Hello, could you provide a few examples of how the TxDB and GRanges files are used? I'd like to compare those from hg38 annotations to see how they differ.

axxxxx08 commented 4 months ago

Hi, I encountered the same warning messages when creating the TxDb and GRanges objects, and I was following the example provided in this link: https://github.com/Bioconductor/GenomicFeatures/issues/65