sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
28 stars 4 forks source link

F/rice synteny #90

Closed sjteresi closed 2 years ago

sjteresi commented 2 years ago

Commit messages a little unclear:

Contains all of the code relevant to the rice syntelog (positionally conserved gene pairs between two or more genomes of interest) TE density comparisons. All code relevant to the analysis is located in examples/Rice_Synteny/src/ (Though you don't need to review those scripts Michael, as they are all data analysis or related to generating datasets). In creating the code for this example I also decoupled the filtration of the gene and TE annotations from the regular pipeline. Previously the regular pipeline took a raw annotation file in and then modified it for usage in the main TE Density pipeline, however due to inconsistent formatting among raw annotation files for many genomes, I decided it would be best to decouple the cleaning of those files from the main pipeline. Now, the user provides as arguments to process_genome.py the two cleaned files, which they have to generate on their own, instead of the raw annotation files.

In the case of this rice set I did that in examples/Rice_Synteny/src/import_rice_gene_anno.py and examples/Rice_Synteny/src/import_EDTA.py. I also had to modify transposon/verify_cache.py to accept the new inputs, and I also replaced the two import scripts to utilize an extremely minimal pandas read_csv command.

Finally, the Makefile in examples/Rice_Synteny contains all of the steps required to generate all of the analyses for this section of the paper.