sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
28 stars 4 forks source link

upgrade dependencies #125

Closed teresi closed 1 year ago

teresi commented 1 year ago

DISCUSSION

I upgraded to python3.10 and upgraded our dependencies.

I wanted to use python3.11, but I can't get pytables to install b/c it can't find the HDF5 runtime for some reason. While we could try compiling pytables from source (or removing our DataFrame.to_hdf calls), using 3.10 and the HDF5 debs worked without it.

Since most of our packages got upgraded I figured this was enough for a PR.

In the future it would be nice to remove matplotlib (and other unnecessary packages) from the main project since it's not needed to run the analysis and it's not required in our tests. We could have a separate requirements file for post processing and/or development, etc.

TESTING

do the tests pass?

(genes_310) [21:51 GOKU][f/py310]:/data/genes/TE_Density
[ins]▸$ make test
mkdir -p /data/genes/TE_Density/tests/test_h5_cache_loc
mkdir -p /data/genes/TE_Density/tests/output_data
pytest /data/genes/TE_Density
============================================================== test session starts ==============================================================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /data/genes/TE_Density
collected 197 items

tests/unit/test_DensityData.py .                                                                                                          [  0%]
tests/unit/test_GeneDatum.py ....                                                                                                         [  2%]
tests/unit/test_MergeData.py ..................ss                                                                                         [ 12%]
tests/unit/test_Overlap.py ........................................................................                                       [ 49%]
tests/unit/test_OverlapData.py ..............                                                                                             [ 56%]
tests/unit/test_ReviseAnno.py .........                                                                                                   [ 60%]
tests/unit/test_WorkerProcess.py ...                                                                                                      [ 62%]
tests/unit/test_data.py ....                                                                                                              [ 64%]
tests/unit/test_density.py .........................................................                                                      [ 93%]
tests/unit/test_gene_data.py ....                                                                                                         [ 95%]
tests/unit/test_import_genes.py ...                                                                                                       [ 96%]
tests/unit/test_preprocess.py ..                                                                                                          [ 97%]
tests/unit/test_transposon_data.py ....                                                                                                   [100%]

======================================================== 195 passed, 2 skipped in 2.88s =========================================================

does our test set work?

(genes_310) [21:41 GOKU][f/py310]:/data/genes/TE_Density
[ikkkkkkns]▸$ ./process_genome.py ..//TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_GFF3_genes_main_chromosomes.tsv ../TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_chr_main_chromosomes.fas.mod.EDTA.TEanno.tsv adiposetoperus -n 4 --output_dir ../TE_Density_Filtered_Gene_and_TE_Annotations/results
2022-12-01 21:41:43 GOKU __main__[1173006] INFO preprocessing...
2022-12-01 21:41:43 GOKU PreProcessor[1173006] INFO Reading pre-filtered gene annotation file: /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_GFF3_genes_main_chromosomes.tsv
2022-12-01 21:41:43 GOKU PreProcessor[1173006] INFO import of preprocessed gene annotation... success!
2022-12-01 21:41:43 GOKU PreProcessor[1173006] INFO Reading pre-filtered TE annotation file: /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/Cleaned_TAIR10_chr_main_chromosomes.fas.mod.EDTA.TEanno.tsv
2022-12-01 21:41:44 GOKU PreProcessor[1173006] INFO import of pre-filtered transposon annotation... success!
2022-12-01 21:41:44 GOKU PreProcessor[1173006] INFO load revised TE: /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/results/filtered_input_data/revised_input_data/Revised_Cleaned_TAIR10_chr_main_chromosomes.fas.mod.EDTA.TEanno.tsv
2022-12-01 21:41:44 GOKU PreProcessor[1173006] INFO import of pre-filtered transposon annotation... success!
2022-12-01 21:41:44 GOKU __main__[1173006] INFO preprocessed 5 files to ../TE_Density_Filtered_Gene_and_TE_Annotations/results/filtered_input_data/input_h5_cache
2022-12-01 21:41:44 GOKU __main__[1173006] INFO preprocessing... complete
2022-12-01 21:41:44 GOKU __main__[1173006] INFO process overlap...
2022-12-01 21:41:44 GOKU OverlapManager[1173006] INFO output overlap data to /data/genes/TE_Density_Filtered_Gene_and_TE_Annotations/results/tmp/overlap
process     : 0it [00:00, ?it/s]
genes       : 0it [00:00, ?it/s]
2022-12-01 21:41:44 GOKU __main__[1173006] INFO processed 5 overlap jobs
2022-12-01 21:41:44 GOKU __main__[1173006] INFO process overlap... complete
2022-12-01 21:41:44 GOKU __main__[1173006] INFO process density
subsets: 100%|██████████████████████████████████| 30/30 [05:32<00:00, 11.08s/it]
2022-12-01 21:47:17 GOKU __main__[1173006] INFO process density... complete