Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
I upgraded to python3.10 and upgraded our dependencies.
I wanted to use python3.11, but I can't get pytables to install b/c it can't find the HDF5 runtime for some reason.
While we could try compiling pytables from source (or removing our DataFrame.to_hdf calls), using 3.10 and the HDF5 debs worked without it.
Since most of our packages got upgraded I figured this was enough for a PR.
In the future it would be nice to remove matplotlib (and other unnecessary packages) from the main project since it's not needed to run the analysis and it's not required in our tests.
We could have a separate requirements file for post processing and/or development, etc.
(genes_310) [21:41 GOKU][f/py310]:/data/genes/TE_Density
[ikkkkkkns]▸$ ./process_genome.py ..//TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_GFF3_genes_main_chromosomes.tsv ../TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_chr_main_chromosomes.fas.mod.EDTA.TEanno.tsv adiposetoperus -n 4 --output_dir ../TE_Density_Filtered_Gene_and_TE_Annotations/results
2022-12-01 21:41:43 GOKU __main__[1173006] INFO preprocessing...
2022-12-01 21:41:43 GOKU PreProcessor[1173006] INFO Reading pre-filtered gene annotation file: /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_GFF3_genes_main_chromosomes.tsv
2022-12-01 21:41:43 GOKU PreProcessor[1173006] INFO import of preprocessed gene annotation... success!
2022-12-01 21:41:43 GOKU PreProcessor[1173006] INFO Reading pre-filtered TE annotation file: /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_chr_main_chromosomes.fas.mod.EDTA.TEanno.tsv
2022-12-01 21:41:44 GOKU PreProcessor[1173006] INFO import of pre-filtered transposon annotation... success!
2022-12-01 21:41:44 GOKU PreProcessor[1173006] INFO load revised TE: /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/results/filtered_input_data/revised_input_data/Revised_Cleaned_TAIR10_chr_main_chromosomes.fas.mod.EDTA.TEanno.tsv
2022-12-01 21:41:44 GOKU PreProcessor[1173006] INFO import of pre-filtered transposon annotation... success!
2022-12-01 21:41:44 GOKU __main__[1173006] INFO preprocessed 5 files to ../TE_Density_Filtered_Gene_and_TE_Annotations/results/filtered_input_data/input_h5_cache
2022-12-01 21:41:44 GOKU __main__[1173006] INFO preprocessing... complete
2022-12-01 21:41:44 GOKU __main__[1173006] INFO process overlap...
2022-12-01 21:41:44 GOKU OverlapManager[1173006] INFO output overlap data to /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/results/tmp/overlap
process : 0it [00:00, ?it/s]
genes : 0it [00:00, ?it/s]
2022-12-01 21:41:44 GOKU __main__[1173006] INFO processed 5 overlap jobs
2022-12-01 21:41:44 GOKU __main__[1173006] INFO process overlap... complete
2022-12-01 21:41:44 GOKU __main__[1173006] INFO process density
subsets: 100%|██████████████████████████████████| 30/30 [05:32<00:00, 11.08s/it]
2022-12-01 21:47:17 GOKU __main__[1173006] INFO process density... complete
DISCUSSION
I upgraded to python3.10 and upgraded our dependencies.
I wanted to use python3.11, but I can't get pytables to install b/c it can't find the HDF5 runtime for some reason. While we could try compiling pytables from source (or removing our DataFrame.to_hdf calls), using 3.10 and the HDF5 debs worked without it.
Since most of our packages got upgraded I figured this was enough for a PR.
In the future it would be nice to remove matplotlib (and other unnecessary packages) from the main project since it's not needed to run the analysis and it's not required in our tests. We could have a separate requirements file for post processing and/or development, etc.
TESTING
do the tests pass?
does our test set work?