zhangrengang / TEsorter

TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes
https://doi.org/10.1093/hr/uhac017
GNU General Public License v3.0
87 stars 19 forks source link

Allocation into lineages for metazoan LTR-RTs #51

Open alexandrosbousios opened 1 year ago

alexandrosbousios commented 1 year ago

Hi Ren-Gang,

This issue may partially overlap with previous questions, but I think it will help if it shows up separately here.

Is there any progress/updates on allocating animal LTR-RTs into lineages (SIRE, Ale, Tekay etc.) as you successfully do in plants, or this is yet not possible?

Related to this, what is the purpose of selecting -db rexdb-metazoa instead of rexdb-plants? I suppose that it is helping towards a better allocation into Copia, Ty3, or unknown LTR-RTs, correct?

Could you also please clarify (and maybe add a note in the main page of what is rexdb-tir and rexdb-pnas? Apologies if this information is somewhere but I've missed it.

Also a request: could you add an output file in TEsorter that the user can easily select the fasta files of the full-length elements (i.e. the original input file) that are SIRE, or ATHILA etc.? That will be very handy if someone is interested in further analyzing a specific lineage.

Thanks, Alex

zhangrengang commented 1 year ago

Hi Alex, The lineage-level classification relies on the database. There is no update of databases at present, but GyDB may provide some details for animal. It is possible to create such a database for animal, but I am not familiar to this.

-db rexdb-metazoa provides a metazoa subset of REXdb (similarly, rexdb-plant is a plant subset of REXdb). It may be more specific for animal.

rexdb-pnas can be referred to https://github.com/zhangrengang/TEsorter#citations (the prefix rexdb may be confused and I will revise the name in future). rexdb-tir is a DNA/TIR-element subset of REXdb. It is for test purpose and now is not available in the last version.

The request may be implemented with get_record.py in the package. For example:

cat rice6.9.5.liban.rexdb.dom.tsv | grep -P "\-RT\t" | grep SIRE | get_record.py -i rice6.9.5.liban.rexdb.dom.faa -o rice6.9.5.liban.rexdb.dom.SIRE-RT.faa -t fasta
alexandrosbousios commented 1 year ago

Hi Ren-Gang,

Thanks for the clarifications; take-on message is that it is still not possible to allocate animal LTR-RTs into lineages yet. Hopefully, some animal TE labs will take the plunge in a way that Neumann et al (2019) and others before them did for plants.

Your script is very helpful and I've missed it, but my request was for retrieving the sequence of the full-length elements, not their genes!

Best, Alex

zhangrengang commented 1 year ago

Hi,Alex. You may just extract the sequence id of the full-length elements (i.e. from rice6.9.5.liban.rexdb.cls.tsv), and then use get_record.py to extract the sequences from the input. It should be a similar process.

alexandrosbousios commented 1 year ago

Thanks Ren-Gang!