sean-bam / Tnacity

Finding the boundaries of Tn7-like transposons
1 stars 0 forks source link

Clade 4 in the phylogenetic tree #2

Open Jigyasa3 opened 5 months ago

Jigyasa3 commented 5 months ago

Dear @sean-bam ,

Thanks again for a great analysis and resource! I am interested in examining the sequences/data associated with clade4 of the phylogenetic tree. In the manuscript, clade 4 includes the subtype I-F. But it appears like there are other Tn7-like transposons in this clade too (TableS1C). Would it be possible to extract out only the subtype I-F from this clade? For example, the manuscript specifically examines the "known" CASTs (523 of them). Are all the "known" CASTs from subtype I-F or from other systems also? Is it possible to extract all "known" subtype I-F CASTs from the dataset?

Regards, Jigyasa

sean-bam commented 5 months ago

Hi @Jigyasa3, thanks for your interest in this work! The 523 "known" CASTs come from the supplementary materials of two references, which are cited in the paper: https://www.biorxiv.org/content/10.1101/2021.02.06.429022v1.supplementary-material and https://doi.org/10.1016/j.cell.2020.11.005

To get all the subtype I-F CASTs from the dataset, check the file Tn7_annotations.csv. Those transposons all carry Cas6_I-F, Csy3_I-F and Csy2_I-F proteins (see the final column). The Transposon id is in the first column, which you can use to get the DNA and protein sequences in the files Tns.fna / Tns.faa

Jigyasa3 commented 5 months ago

Hi @sean-bam , Thank you so much for a quick and detailed reply! Really appreciate it!