mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

Issue with "Chr1" instead of "1" in the index file #22

Closed qzhuang8 closed 1 year ago

qzhuang8 commented 1 year ago

Dear Tam,

Thanks for the wonderful tool. I am currently debugging when using TElocal for analyzing RNA-seq data with alignment results. The output .cntTable files showed matched numbers for genes but all 0 for TEs. I am wondering whether it is due to the index issue where in the gene .gtf files it shows as "1" instead of "Chr1" for coordinates. In the TE .gtf prebuilt files, it used "Chr1" instead.

I tried to build the index from my end but I could not find the TEindex file after installing TElocal.

Do you think the index format ("Chr1" vs "1" for example) matters in this case? Many thanks in advance~

Best regards, Qinwei

olivertam commented 1 year ago

Hi Qinwei,

Yes, you have identified the issue with your run. Unfortunately, the index builder is still in beta, and is quite slow, and thus has not been released. What genome build are you using?

Thanks.

qzhuang8 commented 1 year ago

Hi Tam,

Thanks for the prompt response. I am currently using the mm10 downloaded from the links you have provided in the manual. Thanks again!

Best, Qinwei

olivertam commented 1 year ago

Hi Qinwei,

Might I recommend using GRCm38 with the Ensembl nomenclature? This is the same genome release as mm10, except that it uses the Ensembl chromosome nomenclature ("1") instead of the UCSC chromosome nomenclature ("chr1").

Thanks.

qzhuang8 commented 1 year ago

Hi Tam,

I went back to check on the alignment .bam files with samtools and also check on the .gtf file of gene annotation. Both of them were actually using Ensembl chromosome nomenclature ("1") since the output file showed chromosomes as "1" instead of "chr1".

However, the "mm10_rmsk_TE.gtf.locInd.locations" that I downloaded showed "chr1" instead. I could not view the "mm10_rmsk_TE.gtf.locInd" file since it's binary format. Is it possible that the mm10 index file "mm10_rmsk_TE.gtf.locInd" was actually generated with UCSC chromosome nomenclature ("chr1")?

Best, Qinwei

olivertam commented 1 year ago

Hi Qinwei,

Yes, you are correct. Might I recommend using the TE GTF for GRCm38 with the Ensembl nomenclature?

Thanks

qzhuang8 commented 1 year ago

Hi Tam,

Thank you so much. Now that I have realized there are already generated TE GTF index files for GRCm38 on your website as well. Sorry for the ignorance. I will try that. Many thanks!

Best, Qinwei

PinpinSui commented 1 year ago

Hi Tam, Thank you very much for this tools. I am runing TEcount for my human RNA-seq data. My genome reference and gene annotation gtf with chromosome nomenclature ("1"). My TE annotation gtf with chromosome nomenclature ("chr1"). It can be run successfully but the TE count number is small. Would you like to help check that wether the chromosome nomenclature in .fa("1"), gene.gtf("1") and TE.gtf("chr1") is right? Thank you, Pinpin

olivertam commented 1 year ago

Hi Pinpin,

You would need a TE GTF that uses the chromosome nomenclature that matches your genome reference (it sounds like it's Ensembl style). Please take a look here to see if there is a version for your genome build (should have Ensembl in the name). If not, please let me the organism and genome build.

Thanks.

PinpinSui commented 1 year ago

Thank you very much!