Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
Contains all of the code relevant to the rice syntelog (positionally conserved gene pairs between two or more genomes of interest) TE density comparisons. All code relevant to the analysis is located in examples/Rice_Synteny/src/ (Though you don't need to review those scripts Michael, as they are all data analysis or related to generating datasets). In creating the code for this example I also decoupled the filtration of the gene and TE annotations from the regular pipeline. Previously the regular pipeline took a raw annotation file in and then modified it for usage in the main TE Density pipeline, however due to inconsistent formatting among raw annotation files for many genomes, I decided it would be best to decouple the cleaning of those files from the main pipeline. Now, the user provides as arguments to process_genome.py the two cleaned files, which they have to generate on their own, instead of the raw annotation files.
In the case of this rice set I did that in examples/Rice_Synteny/src/import_rice_gene_anno.py and examples/Rice_Synteny/src/import_EDTA.py. I also had to modify transposon/verify_cache.py to accept the new inputs, and I also replaced the two import scripts to utilize an extremely minimal pandas read_csv command.
Finally, the Makefile in examples/Rice_Synteny contains all of the steps required to generate all of the analyses for this section of the paper.
Commit messages a little unclear:
Contains all of the code relevant to the rice syntelog (positionally conserved gene pairs between two or more genomes of interest) TE density comparisons. All code relevant to the analysis is located in
examples/Rice_Synteny/src/
(Though you don't need to review those scripts Michael, as they are all data analysis or related to generating datasets). In creating the code for this example I also decoupled the filtration of the gene and TE annotations from the regular pipeline. Previously the regular pipeline took a raw annotation file in and then modified it for usage in the main TE Density pipeline, however due to inconsistent formatting among raw annotation files for many genomes, I decided it would be best to decouple the cleaning of those files from the main pipeline. Now, the user provides as arguments toprocess_genome.py
the two cleaned files, which they have to generate on their own, instead of the raw annotation files.In the case of this rice set I did that in
examples/Rice_Synteny/src/import_rice_gene_anno.py
andexamples/Rice_Synteny/src/import_EDTA.py
. I also had to modifytransposon/verify_cache.py
to accept the new inputs, and I also replaced the two import scripts to utilize an extremely minimal pandasread_csv
command.Finally, the Makefile in
examples/Rice_Synteny
contains all of the steps required to generate all of the analyses for this section of the paper.