mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

TElocal_indexer #47

Open zhangqc723 opened 3 weeks ago

zhangqc723 commented 3 weeks ago

Hello, I want to use TElocal and I can not find the locInd file for mm9. Now, I downloaded a file named mm9_rmsk_TElocus.ind.gz from https://www.dropbox.com/scl/fo/o0my0l1c7s40un9qv6yvf/AC8F4xbB-V3SGcwzMeJ8vG0?rlkey=sbsb00bbcrq4ofmq1oviy7ws1&e=1&dl=0, is this document suitable for me? Besides, I run ./TElocal_indexer by TE_GTF file, Is it normal for this program to take more than a day and still not be finished?

olivertam commented 3 weeks ago

Hi,

Thanks for your interest in the software. Since mm9 is two releases behind the latest mouse genome (mm39), we have archived the TElocal prebuilt index. It is now available here.

Unfortunately, TElocal_indexer takes a very long time to build those indices (>3 days for mouse). We hope to remove that requirement soon.

Thanks.

zhangqc723 commented 3 weeks ago

Thanks for your patient reply and developing such a useful tool, I will run TElocal with index of GTF provided by you. Thanks you again.

zhangqc723 commented 2 weeks ago

Hi,

Thanks for your interest in the software. Since mm9 is two releases behind the latest mouse genome (mm39), we have archived the TElocal prebuilt index. It is now available here.

Unfortunately, TElocal_indexer takes a very long time to build those indices (>3 days for mouse). We hope to remove that requirement soon.

Thanks.

Thanks for your reply, Now I downloaded the TElocal prebuilt index from the link. And I found the names of TEs were mixed in the result of TElocal. For example, the MamRep1161 belongs to TcMar and Tigger family. But in my TE GTF file, it only belongs to TcMar family, Could you explain it? Thanks!

image

olivertam commented 2 weeks ago

Hi,

This is due to the annotation in UCSC RepeatMasker being inconsistent at times. This was raised with the UCSC crew, and appears to affect the older genome builds more. It might have been resolved in a more recent GTF update, but since it's a older genome build, we have not re-built the index.

Thanks.

zhangqc723 commented 2 weeks ago

This is due to the annotation in UCSC RepeatMasker being inconsistent at times. This was raised with the UCSC crew, and appears to affect the older genome builds more. It might have been resolved in a more recent GTF update, but since it's a older genome build, we have not re-built the index.

Thanks.

Thanks for your reply.

kamari-weaver commented 1 week ago

Any idea where to find the locInd for mm39?

olivertam commented 1 week ago

Hi,

Feel free to browse here.

Thanks