pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
85 stars 30 forks source link

`sciform` review #121

Closed jagerber48 closed 2 weeks ago

jagerber48 commented 10 months ago

Submitting Author: Justin Gerber(@jagerber48) All current maintainers: (@jagerber48) Package Name: Sciform One-Line Description of Package: A package for converting python numbers (floats, Decimals) into scientific-formatted strings more suitable for reading and presentation. Repository Link: https://github.com/jagerber48/sciform Version submitted: 0.24.0 Editor: @Batalex
Reviewer 1: @isabelizimm
Reviewer 2: @machow
Archive: DOI Version accepted: 0.34.1 JOSS DOI: N/A Date accepted (month/day/year): 02/07/2024


Code of Conduct & Commitment to Maintain Package

Description

Scope

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication Options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Confirm each of the following by checking the box.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

NickleDave commented 9 months ago

Linking back to discussion on the presubmission issue: https://github.com/pyOpenSci/software-submission/issues/114#issuecomment-1648683592

We will hold off on starting this review for just a bit (two weeks maximum) while we help @jagerber48 with some questions about development and scope.

jagerber48 commented 9 months ago

I have come to decisions on most of the questions I had that @NickleDave referred to in the previous comment. For one question I opened a topic in the pyopensci discourse: https://pyopensci.discourse.group/t/how-to-avoid-repeating-a-long-list-of-keyword-options-throughout-a-package/331/5. The suggestions there are very good and I plan to explore them in upcoming tests and versions for sciform. The suggestion I will likely take will result in changes to the user-facing API and hopefully a large cleanup of the codebase.

@NickleDave how do you suggest we proceed with the review? I'm in an active phase of development so I'm not sure what makes sense in terms of hitting a moving target. I just made an important update in version 0.22.0 involving changing from float-based formatting on the backend to Decimal-based formatting. For now I will set 0.22.2 as the version submitted. I anticipate version 0.23.0 will incorporate changes to avoid the code duplication suggested in the discourse topic, and I said above, this will be a pretty major modification to the codebase and a change to the user API. I imagine I'll be able to implement these changes in the next month, probably the next two weeks. Given that timeline, what makes sense for the review timeline? Also, in this post you suggested I could release these changes as a release candidate version. Were you imagining that release candidate might appear before the review, during the review, or after? Would the release candidate be the main target of the review or, e.g. 0.22.2 without the new changes?

I don't have any rush to get this review going if we want to wait for more recent code on the one hand. But on the other hand, if I keep changing the code and not starting the review we'll never hit the moving target..

NickleDave commented 9 months ago

Hi @jagerber48 thank you for updating here.

You are right that we should not let this get to a place where you keep changing the code. But I think the best thing to do is wait until you incorporate changes suggested in the Discourse topic, then start right away.

Please release that version, which you expect to be 0.23.0, and then set that as the version submitted above.

jagerber48 commented 9 months ago

Thanks. That makes the most sense to me also. That's what I'll do.

NickleDave commented 9 months ago

Great, thank you @jagerber48 -- once you reply back that you've released that version, we'll go ahead with the review

jagerber48 commented 9 months ago

Ok. I've implemented a version of the changes discussed in the discourse topic. I'll post details in the topic since progress has been made but there may still be room for improvement. sciform is on version 0.24.0 for now and I think review can move forward. I have a few small-scale (I think) questions that I'm curious to have addressed/considered during the review. I'll compile and post them here.

I may continue to make small code/docs changes and releases during the review but I'm happy to pin the review to version 0.24.0. Alternatively, if it would be better for me to use some other git workflow during the review I'm happy to do that. One of my questions is generally what my git workflow should be for this package. Right now I just have a main branch and I make feature branches that have one main feature but sometimes a small feature or two sneaks in. I then just PR merge that feature branch into the main branch. I don't really have much experience releasing versions of code so I'm just going off some stuff I've learned. Not experience with release branches, pre-releases, etc. (Happy to discuss in more detail later, just want to answer now if I should do anything differently immediately during the review time).

jagerber48 commented 9 months ago

I pushed version 0.25.0 which follows a more updated version of what was discussed in the thread for reducing code repetition. If the review has already started or anything happy to keep the review looking at 0.24.0.

Some general questions I have that may be in-scope for the review:

Then I'll also mention that I've found making the documentation good and consistent has been my biggest challenge/time spent on this project. I continue to have ideas for documentation improvements but documentation has been a moving target as the code has continued to change. Just to say: I expect errors in the documentation and places I can make improvements.

jagerber48 commented 9 months ago

I don't have any non-trivial breaking changes in mind for the package at this time (though stuff always seems to come up). The only API changes I have in mind are possible name changes to options or classes. There are a few I'm not 100% happy with so I may poll pyopensci or e.g. code review stack exchange to try to get some broader set of opinions on some naming choices.

Just two I've written down:

Those are just two I had written down, but pretty much any name in the API is fair game for naming improvement suggestions.

NickleDave commented 9 months ago

Thank you for letting us know @jagerber48.
We are wrapping up a couple of other reviews but we expect to have an editor free to start this one shortly.
They will make sure reviewers take your requests for feedback into consideration.

jagerber48 commented 9 months ago

sciform is now on version 0.26.2. Version 0.26.0 introduces a number of trivial name changes to various options. 0.26.1 introduced more unit tests for better coverage (I learned how to measure unit test coverage) and helped me discover one bug and another bug/unexpected behavior. The first bug is fixed in 0.26.2. The second bug requires some convention decisions to be made about the format specification mini language. See https://github.com/jagerber48/sciform/issues/29.

Batalex commented 8 months ago

Hi @jagerber48,

As we discussed, I will lead the review for sciform. I am looking for reviewers and will let you know when I am done.

jagerber48 commented 7 months ago

Here's an update on the current status of the package and some questions. Since making this submission I modified the code quite a bit moving up to version 0.29.0. Since then I've taken a break from updating sciform because I think all of the core functionality I envisioned is in place. The tasks I see (I think it's likely other will arise during the review) in front of me are

For adding the non-core features I plan to wait until after the pyopensci review and possibly until after I release version 1.0.0 with a more stable API (which will also happen after the pyopensci review). For stabilizing the API since I feel a lot of the question I have are opinion-based I'm excited to get opinions from other people than me! I'd hope that some of these questions can be debated or discussed during the review if they're in scope. Here are the current questions I have listed:

These are some of the main existing api features that I have questions about. I.e. making decisions on the above will constitute breaking changes, so I want to address these before 1.0.0. I am ALSO open to feature requests but will be implementing those with lower priority than stabilizing the API.

Thank you so much for taking the time to read this and also thank you very much for any feedback you are able to provide! I'm also curious to hear suggestions for other communities where I could seek feedback.

Batalex commented 7 months ago

:wave: Hi @isabelizimm and @machow! Thank you for volunteering to review for pyOpenSci. I am truly excited by the team we have on this review.

@jagerber48, Isabel and Michael are involved with quarto, and I believe they have some pretty good opinions on how to format numbers for publishing. I am looking forward to seeing their review!

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: 27 October 2023.

Reviewers: @isabelizimm, @machow Due date: 2023/10/27

jagerber48 commented 6 months ago

Started running the code on real applications myself some more and quickly found some important bugs. Just made a release of version 0.29.1 that fixes these bugs. https://github.com/jagerber48/sciform/releases/tag/0.29.1

isabelizimm commented 6 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Has examples page

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 4-5


Review Comments

Overall, this package is super nifty and clearly offers a huge quality of life increase in scientific publishing. The documentation, as a whole, is pretty comprehensive with lots of examples and a mix of API and narrative style docs. Thank you for a great addition to pyOpenSci!

Response to a few maintainer questions

I've answered a handful of your questions that haven't had a response yet + I have a response to, but happy to have discussion on any other points you're interested in chatting about! 😄

By default value/uncertainty pairs are formatted as 123 +/- 42. But using the unicode_pm=True option it is possible to format this as 123 ± 42. Should the unicode_pm option be True by default? Or even further, should unicode_pm=True just be the only behavior and the option is dispensed with entirely??

I think setting unicode_pm=True as a default is the right move! I think it is a delight sciform offers to users to not have to type ± :smile: But, there are a lot of configuration parameters and it could be missed, so this makes the out-of-the-box experience feel a little extra polished.

Is top_dig_place an ok name?

I wonder if you could somehow work padding into the name? My thought process behind that: top_dig_place looks to be used in the general purpose Formatter where arguments have a variety of use cases (padding, decimal places, separators), so it's use for padding might not be apparent. (These are not very good suggestions, but something like pad_to or pad_to_exp?)

Is superscript_exp an ok name? Maybe just superscript?

I have a slight preference to superscript.

Is bracket_unc an ok name?

For bracket_unc and SciNumUnc alike, my brain did not go to uncertainty for unc. I do realize that a other arguments build off this name (bracket_unc_remove_seps), so brevity is important. (bracket_uncertain + bracket_uncertain_seps maybe? you would have to flip the boolean for the separators then since you drop the negative)

Is SciNum an ok name?

I like SciNum.

Is SciNumUnc an ok name?

It was not immediately apparent to me that SciNumUnc was for uncertainty, but once I made that connection, it made a lot of sense. SciNumUncertain is a bit longer, but more clear. I do think Uncertainty alone isn't verbose enough since you still need your scientific number. FWIW I ended up typing SciNumU and then auto-completing the rest, so I'm not too afraid of longer names.

I wonder if the two binary modes should instead be accessed via a separate option indicating the exponent base (10 or 2 at the moment). I'm also curious to query interest levels in the binary formatting. I personally have no interest in the binary formatting since my work is mostly in physical sciences and I use decimals and SI.

For binary formatting, maybe open an issue/discussion about it and ask people to :thumbsup: if they would like to see it implemented? That way you're not building things you don't have interest in. However, if you already have the plumbing in place, and it's just a matter of exposing an argument, and you feel compelled to do so, then that seems like a good interrim move!

Should I include changelog even for minor version bumps?

I think yes! For every release, it is super useful to see the changes. If it feels like too much effort to go back and write everything manually, even something small like using the Generate Release Notes button when creating a new release is sufficient. :smile:

Right now on my github PRs I have github actions to run tests and linting on different python versions. Is there a way for me to run these tests all on my local system with github? Or is this type of automation typically done with some third party build/automation tool?

Something you can do locally is install and use pre-commit. You can have pre-commit hooks do a variety of things, but some common use cases are running black and flake. You can also configure testing to run on pre-commit, but this is less common to do since sometimes tests can take a while to run.

Optional nits:
Batalex commented 6 months ago

Thank you kindly for this thorough review!

machow commented 6 months ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

📝 NOTE that the package uses a README.rst, but maybe this meets the requirement?

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2.5


Review Comments

Suggested git branching model, or modifications to my existing model. Right now I branch off of main for a feature branch, I work on that, then I PR it into main using github. However, I often sneak 1 or 2 additional features in addition to the main feature into these feature branches. Is this a bad practice? What should I do instead?

I think that--as the package maintainer--if bundling multiple features into a PR is working for you, then it seems okay. It seems more important to limit the scope of PRs to other people's repositories, so that 1 feature doesn't get held up waiting for feedback / discussion on others.

Right now when I merge a PR I have to manually tag the branch on my local repo, push up the tag, then build the code into a sdist/whl and then upload to pypi. I guess I should automate these steps? Are there example for good ways to do this?

You can use the github action https://github.com/pypa/gh-action-pypi-publish to publish to PyPI whenever you make a github release. Here's an example in a tool I maintain called quartodoc, in case it's useful!

Right now on my github PRs I have github actions to run tests and linting on different python versions. Is there a way for me to run these tests all on my local system with github? Or is this type of automation typically done with some third party build/automation tool?

You can use tox, though I just push to github and let the github actions run the tests. There's also act, for running github actions locally (though I've never used it).

SciNum and SciNumUnc used to be sfloat and vufloat before the code was refactored to use Decimal over float. I'm not really in deep love with any of these names.

I have no good opinions on the exact names, but like that they use camel case, so feel more class-y.

Adding non-core features. Importantly there is one important feature I want to add which is pre-defined FormatOptions and corresponding Formatter classes.

This seems super handy!

Enums. Right now when configuring a FormatOptions object the user has to import various enum objects like the ExpMode or ExpFormat enum to configure certain settings. Is it too much of a pain to have to import all of these objects to choose settings? What would be a better alternative?

Tools like pydantic can take Literals or Enums, and give nice validation / feedback if an invalid value is passed. If pydantic feels too heavy duty, I like the suggestion of FormatOptions using Literal, and doing some validation. At a certain point, many options mapping to their own Enums starts to feel a bit cumbersome to type.

Formatter and FormatOptions. Right now creating a Formatter object is a two step process which first involves creating FormatOptions object then passing it to the Formatter constructor.

If Formatter can only be initialized using a single FormatOptions instance--and FormatOptions is all data no methods--then, it seems like they can be a single class. (It seems like the combined class of these two things could still be used to configure the global options).

By default value/uncertainty pairs are formatted as 123 +/- 42 ... Should the unicode_pm option be True by default?

I'm not in love with some of the names chosen for various objects and options.

Originally I wasn't planning support for binary formatting modes but

Not helpful opinions here :/. I think that the package is so useful for this kind of problem that I'd be motivated as a user to figure out what all the names meant, though!

Right now there's an issue with the sciform FSML where the separator options... I'm leaning towards just dropping the ability to configure separators from the FSML. Objections to this approach?

The FSML seems really neat, but reading through the documentation / trying the package, it feels easier to specify FormatOptions (and with literals or some approach that makes seeing options in autocomplete / code completion in eg VS Code would be very fast to initialize!). I always have to look up the options in the built-in string formatting mini-language, but initializing a class gives a lot of nice hinting / auto complete options for helping folks.

(I could see the FSML being nice for quick, common formatting though!)

Overall, it seems like a really helpful library!

Batalex commented 6 months ago

Thanks a bunch, @isabelizimm and @machow, for the reviews!

@jagerber48 I also took a few notes (without the suitable format, editor's privilege 🐈‍⬛ )

Project documentation

README

Instructions are clear, and the statement of need as well. Lacking a few elements from our guide, nothing too serious. Python version compatibility needs to be clarified.

Most of those issues have been solved in the meantime.

Contributing

Missing as of now.

License

Ok.

Changelog

Ok.

Installation

Installation resolves correctly.

pyproject.toml

The classifiers are lacking (python version). Nice setup for the dynamic version & readme.

More classifiers in recent releases.

General remarks and QoC

modes.py

I am on the fence concerning using Literal type hints with the values.

format_utils

API design

I want to propose something regarding the use of the sciform. Currently, the return types for the format functions are strings, which are acceptable for the project's current goal.
I would like to propose making a FormattedValue class inheriting from str. The reason is that we could then include methods such as _repr_html_ to use html syntax in notebooks, or other mime types (LaTeX export to use with quarto?).

jagerber48 commented 6 months ago

@isabelizimm @machow @Batalex thank you so much for the time you've taken to review this package and all the great feedback you've given me! I'll spend some time looking at this feedback and implementing the changes. Some of the changes seem easy, but some of them may require structural considerations. My first step is going to be to try to list which changes I plan to implement and how. I'll likely post that here before I get to work on them.

@Batalex how does the review work from this point forward? Are there any timelines or anything I should be aware of?

Batalex commented 6 months ago

We are now entering a phase of "back and forth", where you would make changes according to the feedback the reviewers offered, and then they would check the items on their list. Just like your typical PR, only a bit longer :D

Considering the scope of the changes proposed, I think we can work on a similar timeline as for the review: a few weeks to implement changes and a few weeks to validate them, and so on.

If you disagree with something in the reviews, that is okay. We just need to get the reviewer's approval at the end, so communication is key to resolving those issues.

machow commented 6 months ago

I've been chewing on the question of SciformOptions and Enums more. In case it's helpful, here's a 1-minute screencast showing some of the nice autocomplete Literals can provide!

https://www.loom.com/share/3ba1eadec7e84689b20d39406c27fa1e

jagerber48 commented 6 months ago

@isabelizimm Thanks so much for your thorough feedback! Responding here and using this as a todo list.

jagerber48 commented 6 months ago

@machow here's my response + todo list for your review, thanks again for taking the time to do this review!

Thanks again for the great feedback!

jagerber48 commented 6 months ago

@Batalex ok, and my response to your comments:

jagerber48 commented 6 months ago

I've been chewing on the question of SciformOptions and Enums more. In case it's helpful, here's a 1-minute screencast showing some of the nice autocomplete Literals can provide!

https://www.loom.com/share/3ba1eadec7e84689b20d39406c27fa1e

@machow thank you so much for putting together this screencast. Ok, that was a pretty compelling comparison between Enum and Literal. I've been searching online for comparisons just like this between Literal and Enum in python but haven't found much yet. I guess this is because Literal is somewhat new. But it seems like, at least for this use case, Literal might just be better than Enum. The Enum feel like some sort of "proper" coding, but I can just imagine myself and users being like, why can't I just pass in a string? The Literal input fixes all of the shortcomings of using "magic" strings as inputs while still being able to accept string inputs.

I will definitely at least mock up a branch of sciform that uses literals instead of enums and see how it goes.

Batalex commented 6 months ago

Thank you for your very neat checklists, which will significantly reduce the mental charge of everyone here. I see that you use a PR-based workflow even though you are the only contributor to the project. That's great! Feel free to request reviews directly on the PRs to avoid cluttering this issue.

In addition to Michael's loom on the subjet of Enum / Literal, since you are targeting Python 3.8+, it might be worth using Literal instead of Enum, because you have access to the get_args function in the typing module. Here is an example of how you could use it.

from typing import get_args, Literal, TypeAlias

DecimalSep: TypeAlias = Literal[".", ","]

def some_function(sep: DecimalSep):
    if sep not in get_args(DecimalSep):
        raise ValueError()

some_function(sep=";")  # raise
some_function(sep=",")  # OK

I think this approach is interesting, because we both have the ease of use of the string argument instead of a verbose Enum, as well as a runtime check.

jagerber48 commented 6 months ago

@machow, @Batalex to be clear, from a user API perspective, the shift to Literal from Enum will involve ONLY accepting Literal inputs to configure options and no longer accepting Enums, correct? It would be a bad choice to accept either a Literal or the corresponding Enum from the user?

Batalex commented 6 months ago

Per PEP 506, there is no way to build a Literal from an expression. Accepting both a Literal and a Enum would require that you duplicate the values throughout the code base.

from enum import Enum
from typing import get_args, Literal, TypeAlias

class DecimalEnum(str, Enum):
    period = "."
    comma = ","

DecimalLit = Literal[".", ","]
DecimalSep: TypeAlias = DecimalLit | DecimalEnum

def some_function(sep: DecimalSep):
    if sep not in get_args(DecimalLit):
        raise ValueError()

some_function(sep=",")  # OK
some_function(sep=DecimalEnum.comma)  # OK

That is a valid possibility, I am just concerned by the confusion of the three symbols we need

jagerber48 commented 6 months ago

Working on code of conduct and contributing guidelines: https://github.com/jagerber48/sciform/pull/74. I was able to easily copy & paste a code of conduct that seems good. I don't see an obvious contributing guidelines for a simple small open source project. Looks like I may need to put more thought in and craft something together from other examples that are around.

jagerber48 commented 6 months ago

@Batalex re: your last comment about an Enum + Literal approach. Thank you for putting together that example. So yes, the string representations would need to be duplicated in two spots. I also have a feeling the context type hint provided by IDE code completion would be long and less helpful in this case. It feels like Enum was a nice controlled strategy for this use case prior to Literal, but I think it's just going to easier for users if they can pass in strings. I's not like its unusual at all in scientific programming to use strings to select function options. scipy and numpy do that sort of thing all over the place. I think it's very rare that I've used a package that actually uses Enum for options. Just some people on stack exchange say it's a best practice. Now with Literal, it seems a lot of the downsides of using strings as inputs have gone away. Specifically, while a format string like engineering (for engineering exponent mode) would have been a "magic" string in the past, using Literal['engineering', ...] makes it no longer as much of a magic string. I will hash out an example that only uses Literal. I'll make a decision while I'm coding whether I want to convert the Literal string user input into an Enum for use by the backend code.

jagerber48 commented 6 months ago

I'm working on contributing guidelines for sciform. I haven't made many if any PRs on other people's repos on github. Trying to figure out how PRs from other people would work on sciform. My basic understanding is that someone would make a fork of sciform that lives in their user space, they would branch off main and make their changes. Then I think they PR their forked version into the main version in my user space. Questions:

NickleDave commented 6 months ago

I can't tell if this would work for sciform for any user or if I would have to grant specific users the ability to make a PR from a fork.

No, you don't need to grant anyone permissions for them to be able to fork and make a PR. If only you could 😇

How would this process look in the git history for sciform? If the PR is approved/merged in would it just appear as a new commit on the main branch? Since the contributors feature branch lives in their space instead of in the main sciform repo in my user space?

Basically: yes, a new commit on main if you "squash and merge", or all the commits if you merge the entire branch More detail: it depends on how you merge https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/about-merge-methods-on-github I recommend to let contributors do whatever in a branch and then just squash and merge as one commit: https://jacobtomlinson.dev/posts/2022/dont-prematurely-squash/rebase-and-force-push-your-prs/

Do I need to describe this process in the contributing guidelines if I want to welcome PRs from other devs?

Ideally although you can probably point to one of many existing resources and say "we do this", e.g. "we use a standard PR workflow similar to NumPy and scikit-learn"

You might find this example from the PyGMT docs helpful: https://github.com/GenericMappingTools/pygmt/blob/main/CONTRIBUTING.md Note how they split out a separate guide that lives in the rendered docs: https://github.com/GenericMappingTools/pygmt/blob/main/doc/contributing.md

jagerber48 commented 6 months ago

I want to give an update. I've checked off a couple of the minor changes, but I'm now delving into an experiment on a major refactor. This refactor involves two thing that I think are major improvements to the UI, but, for now, seem to make developing on the back end a little more of a headache. That said, my goal is to release a version 1.0 of this project and my main criteria for doing that is I want to have a nice user interface. "easiness of maintainability" is not a must-have for releasing 1.0, so I've made the decision, at the moment, to explore what things would look like if I make the UI as nice as possible and then just work around that on the back end. The changes are:

So basically, I had made a few decisions for better static typing and maintainability on the back end, but they were at the cost to the user interface. I've now decided the user interface is more important and I'm refactoring the code to reflect this change. Once these changes are more finalized I'll link to a PR in this thread. I would appreciate getting feedbacks from folks here on (1) the changes in user-friendliness and (2) the changes to the source code.

jagerber48 commented 6 months ago

https://github.com/jagerber48/sciform/pull/82 here's the PR. As of right now the source code has been re-worked and the tests in the tests directory have been re-written and are passing on my system. The main task ahead is rewriting a bit of documentation and the doctests.

jagerber48 commented 5 months ago

Ok I would say the PR is mostly complete now. All tests are passing and documentation and changelog have been updated to reflect (1) the removal of FormatOptions from the public interface along with (2) the replacement of Enum options by string Literal options. This is a big enough change that I would appreciate another set of eyes on it before merging into main. Based on the comments so far I think this is the most significant change I'll be making as part of this review. I would appreciate if anyone could let me know if they could or wouldn't be able to have a look at this in, say, the next 2-3 weeks?

https://github.com/jagerber48/sciform/pull/82

jagerber48 commented 5 months ago

Ok, I released version 0.30.0 which fixes implements the major interface changes discussed above (then had to immediately release 0.30.1 because I messed up the changelog).

I'm now continuing to work through the other topics on the checklists above.

@isabelizimm @Batalex, you both suggested I use black for automated formatting. I'm curious to learn more about this. I currently use Pycharm as my IDE. Pycharm has pretty powerful code inspections that I work to abide by. Also, for this project, I started using flake8 so that I can run some linting/format checking in my CI. I'm curious why you suggest I would want to use black in addition to these two linters.

Actually just played with some sciform code in the black playground. It looks like black is just even more strict about formatting (things like skipping lines, how long chunks of code get broken). These are a few of the things I do still find myself making conscious decisions about, so maybe it could be useful. Still curious for your opinions.

edit: Ok, I've done more research, yes ruff seems really nice, but I'm particular enough about my linting and formatting that I'm going to need to spend some time finding settings I like. I'll probably move sciform to use ruff once I get it figured out.

Batalex commented 5 months ago

I apologize for sounding somewhat pedantic; I just love explaining things in detail.

What is a code formatter, and why do we need it?

Ultimately, the goals of a code formatter fall into two categories: (1) reducing the cognitive charge of reading code and (2) asserting that code diffs add value to the code base.

As you might know, the PEP 8 style guide is prevalent throughout the Python ecosystem. Its specifications are not precise enough, so there are some ambiguities, or some rules are not as relevant as others. Hence, multiple implementations.
Nonetheless, using a code format based on PEP8 lowers the entry barrier for contributors since the code is easier to read.

We suggested black because we can address some of the caveats of PyCharm's integrated formatter.

First, make it easier for potential contributors to work on your project. Though PyCharm is quite popular, that means that VSCode, (neo)vim, helix, etc., users cannot easily format their code contributions with the format you use.
The good news is since PyCharm 2023.2, you can use black as the default formatter. Using a CLI-based tool means you can format the code (or check if the code is properly formatted) outside the IDE, i.e., in a GH action workflow. flake8 checks are pretty limited regarding the format. Finally, the black code format aims to produce smaller diffs.

black is pretty opinionated. One of the things that people do not like about black is that it uses double quotes for strings. You probably have to add a few exceptions to your linter configuration so that it plays nicely with black.

jagerber48 commented 5 months ago

@Batalex thanks for the explanation!

As you might know, the PEP 8 style guide is prevalent throughout the Python ecosystem. Its specifications are not precise enough, so there are no ambiguities, or some rules are not as relevant as others.

This seems like the key. It seems like most people agree that we should follow PEP 8, but the point is PEP 8 still leaves a LOT of freedom for how things should be formatted. I guess what you're saying is that linters and formatters impose tighter constraints on what is considered appropriately formatted. I can understand this. I'm in favor of more tightly restricted formatting (In my work, unrelated to sciform, I use Pycharm AND try to follow it inspections which is a way more opiniated strategy then either not following the inspections or however many of my colleagues have vscode setup).

And then yes, third party linter/formatters should be preferred over Pycharm's linter/formatting because (1) anyone has access to them regardless of their development environment and (2) they seem to be better documented than Pycharm's "inspection" properties.

I saw your suggestion to use ruff and I've been looking into it the past day. It seems really nice but since I'm particular about linting/formatting it's going to take me a while to figure out how I want to get it configured. I'll probably do a few passes on the sciform source code trying different settings. For some reason I decided I like single quotes more than double quotes, but given how widespread black is (and ruff follows black formatting), it seems like I should switch over to double quotes. Also, I was able to get a ruff plugin for pycharm so that its inspection correspond to ruff output and so that you can apply ruff fixes and formatting!

Anyways, I could have a really long discussion about opinions about linting and formatting! I think I'll save that away for a discussion on slack or discord and try to post my end-results here.

Batalex commented 5 months ago

Hey @jagerber48, how are we moving forward with this review? Feel free to give us a shout if you need some help / additional information

jagerber48 commented 5 months ago

@Batalex thanks for the ping. So third party linter/formatting is something I've been needing in my life. I'm currently working on a PR getting sciform in line with the ruff formatter: https://github.com/jagerber48/sciform/pull/88. Hopefully I'll have it merged in a few days.

I think with the completion of this PR I will have checked off almost everything on the list except for opinions on naming convention. I do think I still may be seeking stronger opinions on naming conventions so I'm open to stronger opinions there and also suggestions for how to get finalized on the names (one suggestion can be: "don't worry about it so much, just go with the names here"). I'm worried because I'm a bit of a perfectionist and I often do things one way thinking it's good, but then a few months later realize there was a better way. If I'm in a situation with sciform where I feel like I can't change the names because users are relying on the stable api, then I'll be frustrated.

jagerber48 commented 5 months ago

Ok, merged a big commit into main branch (not released yet though) where I lint and reformat using ruff: https://github.com/jagerber48/sciform/pull/88. Most of it I'm happy with. I was a little annoyed to move to double quotes from single quotes since it takes an extra keystroke to type double quotes. Guess I can type single quotes and let the tool format to double later. Ruff supports single quotes for some functionality but not all, and Black has made the opiniated choice (that I disagree with) to use double quotes, so there is just more tooling support for double quotes. Once Ruff gets support for single quotes on more features I might move back to them.

Trying to wrap up some more checkboxes. @Batalex you suggest removing exception raising in private function. Most of the exceptions in sciform arise as else clauses in if... elif... else... exhaustive checks. See this discussion. See this block of code

from enum import Enum

class Options(Enum):
    OPTION_1 = 1
    OPTION_2 = 2

def foo(option: Options):
    if option is Options.OPTION_1:
        res = 1
    elif option is Options.OPTION_2:
        res = 2
    return res

Pycharm gives me a warning on the last line saying Local variable 'res' might be referenced before assignment.. This motivated a lot of the else clauses with exceptions. Do you think this is a bad practice? I should check Ruff's behavior on this. Ruff might be more intelligent about static type checking exhaustiveness checks.

Batalex commented 5 months ago

I am pretty sure that black's double quotes will grow on you! That was the case for pretty much everyone I know who used it. No pressure, though, as you said

Guess I can type single quotes and let the tool format to double later.

I hope that with both black and ruff, you will soon be more productive and confident than ever in your code base.

Unfortunately, in your case, there might not be a perfect answer to fix all the issues with exhaustiveness checks. We are better off forgetting about this comment of mine altogether.

My recommendation would be to use match statements, as they communicate your intent the most clearly.

def foo(option: Options) -> int:
    match option:
        case Options.OPTION_1:
            res = 1
        case Options.OPTION_2:
            res = 2

    return res

However, (1) sciform targets python 3.9+, and (2) type checkers do not enforce match exhaustiveness against match statements as they should. That means that Pycharm's internal linter would still give a warning "referenced before assignment."

Using python 3.11's typing.assert_never would be the second choice, but it is again incompatible with python 3.9.

If I were to nitpick, we could drop those else clauses with an exception in all nonpublic facing functions, as we would already have validated at runtime that option is indeed an Options. But this adds little value to the code base, so I think we should keep things as they are.

jagerber48 commented 5 months ago

Ok, thanks for the response. In that case, the list is indeed getting short. I'm going to collect the remaining items I have

isabelizimm commented 5 months ago

These changes and updates look great! If you would like second eyes on any changes going in, feel free to tag me in PRs. I tend to be best summoned by an @ 😄

Re-development instructions:

Minimum: add a line in installation to show people how to get latest version, eg

pip install git+https://github.com/jagerber48/sciform.git

This will show people how to get latest development changes, even if they're not released yet. (example)

Slightly more effort: some documentation, either in a README or CONTRIBUTING file to show mainly 1) how to get to a local install, 2) how to build docs and 3) how to run tests. One example is the CONTRIBUTING.md guide in the package pins..

How exactly you want to configure those is up to you. My experience is that requirements.txt files tend to get stale, so I generally lean towards sciform[dev] via pyproject.toml route. However, the sciform package itself doesn't have any dependencies, so I think that using requirements files is a perfectly valid move in this scenario if it works better for you! If you want to make it different requirements files, splitting it out to requirements-test/requirements-docs makes sense, but probably no more modular than that.

jagerber48 commented 5 months ago

@isabelizimm Thanks for all that info. Ok, it sounds like your main suggestion is to include instructions for how people can install the latest development/pre-release version of the code to e.g. test it out. That is easy to do. The other part that we've been discussing is to include instructions for people who want to develop sciform. These two things have different requirements.

In any case, I have a PR with both of them I would appreciate you having a look at if you're able to: https://github.com/jagerber48/sciform/pull/90. I think it's relatively straight forward.

I decided to go with the pip install -e .[dev] approach, and also to not break things down any further than this. The numpy/scipy/matplotlib dev requirements really aren't that burdensome, and so few people are going to be developing sciform I think it's just not worth any added overhead.

jagerber48 commented 5 months ago

Unless there are strong opinions, I don't want to setup pre-commit hooks or tox/act to do any sort of CI locally. I'm fine just letting it run in the PR and doing the local steps manually. In the future I may set up pre-commit hooks to run tests and do formatting as I become more comfortable with the tools. The main downside I see to setting up the pre-commit hooks is that it seems like it would be challenging to submit a commit that doesn't perfectly conform to everything. I could see this being annoying if I'm trying to set up some quick and dirty code that might fail some tests, or that I don't want to bother to format perfectly and I want to commit it to keep it around. Maybe there's an easy way to bypass pre-commit hooks for these sorts of situations?

jagerber48 commented 5 months ago

I'm going to rename top_dig_place to left_pad_dec_place. This is consistent with a renaming of the rounding mode that rounds by "decimal place" (similar to "precision" for the python built-in FSML) to dec_place. "Digits place" is not a standard terminology for this concept, "decimal place" is the standard terminology.

Here's an idea on SciNum and SciNumUnc. I think SciNum is an ok name for a number that is going to be converted to a formatted string. I don't like SciNumUnc though. I think I could remove the SciNumUnc object and just let the SciNum __init__ method take an optional uncertainty input parameter and then perform the formatting accordingly.

isabelizimm commented 5 months ago

re: development docs: I reviewed that PR, a few small comments but looks good to me! It does really help users to get started contributing when they don't have to guess about some of the going-though-the-motions pieces of development, so I think this is a great addition. Plus, everything ran on my laptop right out of the box, which is always exciting➕ 😄

re: CI/pre-commit: I personally don't use act/tox/etc. I've tried a few, but no project of mine has been heavyweight enough that I found it a net-win to my workflow. The one thing I do use, though, is pre-commit hooks. It might be nice to start using this tool locally to make sure all new code is compliant, and integrate it into CI as you see fit.

A few pre-commit moves that might make it more useful to you:

re: naming: These both seem like good moves! The digits place -> decimal place move immediately clicked in my brain, and having uncertainty as a parameter seems reasonable enough to me.

jagerber48 commented 4 months ago

@isabelizimm ok yes, pre-commit --no-verify was the trick to make me comfortable setting up pre-commit and adding pre-commit instructions. Fortunately git kraken (the tool I use for git) has an easy button to do the --no-verify option, so this is all very convenient for me.

Ok I'm working on another release (0.31.0) that addresses all of the remaining concerns in the review except the lazy formatting FormattedValue suggestion. Here's the PR for the release https://github.com/jagerber48/sciform/pull/96. There are a number of notable changes you can find in the changelog, but the highlights are:

Anyways, if any of you are inclined, @Batalex @machow @isabelizimm I would appreciate if any of you have comments on this jump from 0.30.1 to 0.31.0 before I release it. Otherwise, I'm curious about what next steps on the review look like.

There are a couple improvements I see that can be made, but I consider them to be typical package improvements that should happen over time and I don't know that they have bearing on the review.

Batalex commented 4 months ago

IMO we are at the end of the review process. Thank you kindly for bearing with our demands!

Before I ask @machow and @isabelizimm for their final approval, I would like to ask you if you feel that the API is stable enough so that you can release 1.0 in the same time frame as the pyopensci "seal of quality"?

This is not an obligation by any mean, of course.