tacitvenom / genomics_algo

MIT License
1 stars 2 forks source link

Optimise finding clumps algorithm (Fixes #15) #18

Closed tacitvenom closed 3 years ago

sourcery-ai[bot] commented 3 years ago

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 5.02%.

Quality metrics Before After Change
Complexity 2.75 ⭐ 4.71 ⭐ 1.96 👎
Method Length 54.48 ⭐ 59.33 ⭐ 4.85 👎
Working memory 8.49 🙂 9.23 🙂 0.74 👎
Quality 75.41% 70.39% 🙂 -5.02% 👎
Other metrics Before After Change
Lines 380 400 20
Changed files Quality Before Quality After Quality Change
genomics_algo/algorithms.py 72.82% 🙂 64.30% 🙂 -8.52% 👎
genomics_algo/tests/test_algorithms.py 78.25% ⭐ 78.37% ⭐ 0.12% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
genomics_algo/algorithms.py find_pattern_clumps 18 🙂 173 😞 14 😞 37.61% 😞 Try splitting into smaller methods. Extract out complex expressions
genomics_algo/algorithms.py get_occurences_with_boyer_moore_exact_matching 13 🙂 112 🙂 13 😞 49.82% 😞 Extract out complex expressions
genomics_algo/tests/test_algorithms.py test__get_alignments_skipped_bc_lookup 0 ⭐ 106 🙂 21 ⛔ 55.34% 🙂 Extract out complex expressions
genomics_algo/tests/test_algorithms.py test_get_occurences_in_entire_genome_with_boyer_moores_exact_match 0 ⭐ 55 ⭐ 18 ⛔ 64.84% 🙂 Extract out complex expressions
genomics_algo/algorithms.py get_occurences_with_naive_match 6 ⭐ 62 🙂 10 😞 68.65% 🙂 Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Let us know what you think of it by mentioning @sourcery-ai in a comment.

tacitvenom commented 3 years ago

1) Reason for CI build failure found #24 - genome file was updated after the test was added - this wasn't picked up because the test was skipped all along. The correct value has been updated in the main branch with commit https://github.com/tacitvenom/genomics_algo/commit/e1fcb5b059bdfe97c8878d37dd33c415cd2bfee1

2) Another artificial long text test added test_find_pattern_clumps_long for quick performance calculation and prototyping: Turns out this implementation takes ~35 seconds as opposed to ~25 seconds in the main branch with the naive implementation.

Closing this PR since the changes don't optimize the runtime really.