Getting location of subgenome specific TEs

zhangrengang / SubPhaser

Phase, partition and visualize subgenomes of a neoallopolyploid or hybrid based on the subgenome-specific repetitive kmers.

https://doi.org/10.1111/nph.18173

GNU General Public License v3.0

52 stars 12 forks source link

Getting location of subgenome specific TEs #31

Open EmilianoMora opened 1 month ago

EmilianoMora commented 1 month ago

Hi! Thanks for the great tool! I was wondering if one could get the genomic location of subgenome-specific TE or TE k-mer. My idea is to take a look at coding regions that are upstream and downstream to subgenome-specific TEs.

My current approach is to take a look at the 'k15_q200_f2.ltr.enrich' file in the phase-results folder and look for specific k-mers that are found in one subgenome (column 2) and that have no potential exchange among subgenomes (column 5). Once I identify k-mers that fulfill those requirements I was going to look for the genomic position of those k-mers in the 'LTR.inner.fa.dom.gff3' file that is in the tmp directory. Is that approach correct? or should I be taking a look at other output files?

Thank you in advance!! Bests, Emiliano

zhangrengang commented 1 month ago

Hi Emiliano, the genomic location can be found in LTR id (column 1) in the 'k15_q200_f2.ltr.enrich' file. The id has a format of chromosome:LTR-RT start-LTR-RT end:inner start-inner end (e.g., 1:466998-472099:467154-471938). It should be easy to convert to bed format, and do some intervel operations using bedtools. In the 'LTR.inner.fa.dom.gff3', the genomic position is the location of protein-coding domains of LTR-RTs which is somewhat different to the location of LTR-RTs.

EmilianoMora commented 1 month ago

Thank you!