Closed gessulat closed 3 weeks ago
Just noticed that a lot of the docstrings in your functions are numpy style, whilst the project mostly uses sphinx style ... @wfondrie should we migrate to numpy style? (and probably pick what rules are enforced in ruff -> https://docs.astral.sh/ruff/rules/#pydocstyle-d )
I had a question on commit 466838c ... in my system
tests/system_tests/test_cli.py::test_cli_aggregate
(https://github.com/msaid-de/mokapot/blob/c021f0906121c11b457f42e35d9979a7aff53f5c/tests/system_tests/test_cli.py#L136) fails withassert count_lines(Path(tmp_path, "blah.targets.psms")) == 10256
where the diff is10298 == 10256
. could we specify this as a range, rather than a hard-coded number (since even if the seed is the same, changes in the system/compiler can make numeric outputs unstable)
How much would you think those numbers can vary? I think some quite broad range (e.g. like 9000 <= line_count <= 11000
) would probably suffice here, since we're only testing for plausibility anyway. Maybe even just some lower limit would do, don't you think?
Just noticed that a lot of the docstrings in your functions are numpy style, whilst the project mostly uses sphinx style ... @wfondrie should we migrate to numpy style? (and probably pick what rules are enforced in ruff -> https://docs.astral.sh/ruff/rules/#pydocstyle-d )
I think, the original mokapot code had already numpy style docstrings instead of sphynx style. I didn't notice that when writing new functions and just used sphynx style out of habit, but I think too, that numpy is much better readable when you look directly into the code instead of generated documentation.
Hi @ezander !
I submitted a PR to the branch of this PR over here: https://github.com/msaid-de/mokapot/pull/42 ... its mostly chores but it handles lil things like the hard-coded number.
LMK what you think!
Just noticed that a lot of the docstrings in your functions are numpy style, whilst the project mostly uses sphinx style ... @wfondrie should we migrate to numpy style? (and probably pick what rules are enforced in ruff -> https://docs.astral.sh/ruff/rules/#pydocstyle-d )
I think, the original mokapot code had already numpy style docstrings instead of sphynx style. I didn't notice that when writing new functions and just used sphynx style out of habit, but I think too, that numpy is much better readable when you look directly into the code instead of generated documentation.
Perfect - I was going to request that all the docstrings be converted to Numpy-style 😄.
It looks like we're adding a lot of files to the test directory. At the very least, I think we need to document what they are, say by adding a README within the data directory.
The better alternative would be to move toward all of these being programmatically generated (ie simulated results), when possible. Alas, adding a bunch of real files was one of the mistakes I made early on.
Post Megamerge wishlist (capture as checklists the future issues detected while discussing this PR):
Post Megamerge wishlist (capture as checklists the future issues detected while discussing this PR):
- [x] Brew refactor https://github.com/wfondrie/mokapot/pull/119/files#r1646576640 Issue: https://github.com/wfondrie/mokapot/issues/110
- [x] Re-parametrize chunk size https://github.com/wfondrie/mokapot/pull/119/files#r1646580990 Issue https://github.com/wfondrie/mokapot/issues/123
- [x] Strict zip pre-3.10 https://github.com/wfondrie/mokapot/pull/119/files#r1646583534 - will be solved through upgrade to Python 3.10 as discussed with @wfondrie
- [x] Scaler weight loading https://github.com/wfondrie/mokapot/pull/119/files#r1646591660 Issue: https://github.com/wfondrie/mokapot/issues/124
LinearPsmDataset
and OnDiskPsmDataset
classes. Issue: https://github.com/wfondrie/mokapot/issues/122Attention: Patch coverage is 86.35671%
with 179 lines
in your changes missing coverage. Please review.
Please upload report for BASE (
main@af44229
). Learn more about missing BASE report.
Files with missing lines | Patch % | Lines |
---|---|---|
mokapot/tabular_data.py | 84.52% | 54 Missing :warning: |
mokapot/parsers/pin_to_tsv.py | 39.34% | 37 Missing :warning: |
mokapot/streaming.py | 87.50% | 20 Missing :warning: |
mokapot/mokapot.py | 65.85% | 14 Missing :warning: |
mokapot/brew_rollup.py | 92.69% | 13 Missing :warning: |
mokapot/peps.py | 71.11% | 13 Missing :warning: |
mokapot/dataset.py | 85.96% | 8 Missing :warning: |
mokapot/confidence.py | 95.85% | 7 Missing :warning: |
mokapot/confidence_writer.py | 88.67% | 6 Missing :warning: |
mokapot/brew.py | 84.61% | 2 Missing :warning: |
... and 5 more |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I had a question on commit 466838cc07095522cc7047a571ea71a82f597477 ... in my system
tests/system_tests/test_cli.py::test_cli_aggregate
(https://github.com/msaid-de/mokapot/blob/c021f0906121c11b457f42e35d9979a7aff53f5c/tests/system_tests/test_cli.py#L136) fails withassert count_lines(Path(tmp_path, "blah.targets.psms")) == 10256
where the diff is10298 == 10256
. could we specify this as a range, rather than a hard-coded number (since even if the seed is the same, changes in the system/compiler can make numeric outputs unstable)