zavolanlab / MIRZAG

MIRZA-G - Pipeline and model for miRNA target prediction
5 stars 3 forks source link

Rule calculate_mirza takes too long in comparison to the other rules #10

Open fgypas opened 6 years ago

fgypas commented 6 years ago

In the snakemake version of the workflow, the rule calculate_mirza takes too long in comparison to the other rules. The config I use looks like the following:

input_mirna: ../mirnas.fa
input_mrna: ../02_alignment_extraction_prepare_mirzag/results/targets.fa
input_tree: ../01_alignment_extraction_prepare_annotation/results/tree.prunned.nh
input_multiple_alignments: ../02_alignment_extraction_prepare_mirzag/results/mirzag.tar.gz
input_model_with_bls: ../../../MIRZAG/data/glm-with-bls.bin
input_model_without_bls: ../../../MIRZAG/data/glm-without-bls.bin

scripts: ../../../MIRZAG/docker/scripts

# Output
output_file_name: mirza_g_results.tsv.gz

# Settings
settings_split_by: "__"
settings_index_after_split: 1
settings_mirza_threshold: 50
settings_contextLen_L: 14 # downstream up to the end of the miRNA (This is
# from the miRNA in the 5'end. In the mRNA this will be upstream region)
settings_contextLen_U: 0 # stay with the seed
organism: hg38

The problem is that for each miRNA, MIRZA runs against all target sequences (no dynamic option). @jsurkont Do you think there is an easy way to improve this?