In the snakemake version of the workflow, the rule calculate_mirza takes too long in comparison to the other rules. The config I use looks like the following:
input_mirna: ../mirnas.fa
input_mrna: ../02_alignment_extraction_prepare_mirzag/results/targets.fa
input_tree: ../01_alignment_extraction_prepare_annotation/results/tree.prunned.nh
input_multiple_alignments: ../02_alignment_extraction_prepare_mirzag/results/mirzag.tar.gz
input_model_with_bls: ../../../MIRZAG/data/glm-with-bls.bin
input_model_without_bls: ../../../MIRZAG/data/glm-without-bls.bin
scripts: ../../../MIRZAG/docker/scripts
# Output
output_file_name: mirza_g_results.tsv.gz
# Settings
settings_split_by: "__"
settings_index_after_split: 1
settings_mirza_threshold: 50
settings_contextLen_L: 14 # downstream up to the end of the miRNA (This is
# from the miRNA in the 5'end. In the mRNA this will be upstream region)
settings_contextLen_U: 0 # stay with the seed
organism: hg38
The problem is that for each miRNA, MIRZA runs against all target sequences (no dynamic option). @jsurkont Do you think there is an easy way to improve this?
In the snakemake version of the workflow, the rule calculate_mirza takes too long in comparison to the other rules. The config I use looks like the following:
The problem is that for each miRNA, MIRZA runs against all target sequences (no dynamic option). @jsurkont Do you think there is an easy way to improve this?