pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/

94 stars 35 forks source link

`sciform` review #121

Closed jagerber48 closed 6 months ago

jagerber48 commented 1 year ago

Submitting Author: Justin Gerber(@jagerber48) All current maintainers: (@jagerber48) Package Name: Sciform One-Line Description of Package: A package for converting python numbers (floats, Decimals) into scientific-formatted strings more suitable for reading and presentation. Repository Link: https://github.com/jagerber48/sciform Version submitted: 0.24.0 Editor: @Batalex
Reviewer 1: @isabelizimm
Reviewer 2: @machow
Archive: Version accepted: 0.34.1 JOSS DOI: N/A Date accepted (month/day/year): 02/07/2024

Code of Conduct & Commitment to Maintain Package

[x] I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
[x] I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Include a brief paragraph describing what your package does: sciform is used to convert python float objects into strings according to a variety of user-selected scientific formatting options including fixed-point and decimal and binary scientific and engineering notations. Where possible, formatting follows documented standards such as those published by BIPM or IEC. sciform provides certain options, such as engineering notation, well-controlled significant figure rounding, and separator customization which are not provided by the python built-in format specification mini-language (FSML) for formatting floats into strings. In addition, sciform provides functionality for formatting pairs of floats as value +/- uncertainty pairs according to a variety of scientific standards.

Scope

Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- [ ] Data retrieval
- [ ] Data extraction
- [ ] Data processing/munging
- [ ] Data deposition
- [ ] Data validation and testing
- [x] Data visualization[^1]
- [ ] Workflow automation
- [ ] Citation management and bibliometrics
- [ ] Scientific software wrappers
- [ ] Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an existing community please check below:

[ ] Pangeo
- [ ] My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?
  The target audience for this package includes any scientist trying to print or otherwise display numerical data in a way that is easily readable and conforms to various scientific standards. Numerical data that users may be interested may be numbers (possibly with uncertainties) coming from literature (such as scientific constants), numbers that tabulate raw measurement results or numbers resulting from calculation analyses (such as best-fit algorithms applied to data). Even numbers appearing in plot tick labels can be formatted using sciform. See https://sciform.readthedocs.io/en/stable/examples.html for some example use cases.
- Are there other Python packages that accomplish the same thing? If so, how does yours differ?
  1. Python built-in string formatting mini language (https://docs.python.org/3/library/string.html#format-specification-mini-language). sciform includes its own string formatting mini language closely based on the built in one, but with some differences. Notably sciform includes well-controlled significant figure formatting, engineering notation, binary formatting, SI/IEC prefix substitution, digit grouping and decimal symbol options (helpful for a diversity of locales), exponent value coercion, as well as value +/- uncertainty formatting functionality.
  2. The uncertainties package (https://pythonhosted.org/uncertainties/). sciform was heavily motivated by this package. This package has sophisticated statistical handling of value +/- uncertainty pairs, handling error propagation and simulation under-the-hood. In addition, it has its own extension of the mini language for formatting value +/- uncertainty pairs. sciform has more formatting functionality than the uncertainties package including, especially, engineering notation, grouping separator controls, and prefix substitution. sciform is also a much lighter weight requirement than the uncertainties package. This may be desirable when a user wants to format strings, but they don't need the rest of the full statistical machinery of the uncertainties package.
  3. The prefixed package (https://github.com/Rockhopper-Technologies/prefixed). sciform was also motivated by the prefixed package. This package provides a sort of engineering notation where exponents are rounded to multiples of 3, and then exponents area always replaced with their corresponding SI exponent. prefixed package is a more conservative extension of the built-in formatting language. sciform includes more functionality including engineering notation without prefix substitution and more grouping/decimal symbol control. sciform also includes global configuration options for handling optional SI prefixes such as c, d, da, and h.
  4. The sigfig package (https://sigfig.readthedocs.io/en/latest/). The sigfig package has similar functionality to sciform including sig fig rounding, separator control, value +/- uncertainty formatting including some features that are only forthcoming in sciform. sig fig does not currently support binary formatting. sig fig also does not provide a format specification mini language for formatting floats. Rather floats are formatted using an overload of the built-in round function which I find to be slightly awkward compared to a Formatter object or function.
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted: https://github.com/pyOpenSci/software-submission/issues/114 @NickleDave

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] uses an OSI approved license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions.
[x] contains a tutorial with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

[x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

[x] I have read the author guide.
[x] I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

[x] Last but not least please fill out our pre-review survey. This helps us track submission and improve our peer review process. We will also ask our reviewers and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

NickleDave commented 1 year ago

Linking back to discussion on the presubmission issue: https://github.com/pyOpenSci/software-submission/issues/114#issuecomment-1648683592

We will hold off on starting this review for just a bit (two weeks maximum) while we help @jagerber48 with some questions about development and scope.

jagerber48 commented 1 year ago

I have come to decisions on most of the questions I had that @NickleDave referred to in the previous comment. For one question I opened a topic in the pyopensci discourse: https://pyopensci.discourse.group/t/how-to-avoid-repeating-a-long-list-of-keyword-options-throughout-a-package/331/5. The suggestions there are very good and I plan to explore them in upcoming tests and versions for sciform. The suggestion I will likely take will result in changes to the user-facing API and hopefully a large cleanup of the codebase.

@NickleDave how do you suggest we proceed with the review? I'm in an active phase of development so I'm not sure what makes sense in terms of hitting a moving target. I just made an important update in version 0.22.0 involving changing from float-based formatting on the backend to Decimal-based formatting. For now I will set 0.22.2 as the version submitted. I anticipate version 0.23.0 will incorporate changes to avoid the code duplication suggested in the discourse topic, and I said above, this will be a pretty major modification to the codebase and a change to the user API. I imagine I'll be able to implement these changes in the next month, probably the next two weeks. Given that timeline, what makes sense for the review timeline? Also, in this post you suggested I could release these changes as a release candidate version. Were you imagining that release candidate might appear before the review, during the review, or after? Would the release candidate be the main target of the review or, e.g. 0.22.2 without the new changes?

I don't have any rush to get this review going if we want to wait for more recent code on the one hand. But on the other hand, if I keep changing the code and not starting the review we'll never hit the moving target..

NickleDave commented 1 year ago

Hi @jagerber48 thank you for updating here.

You are right that we should not let this get to a place where you keep changing the code. But I think the best thing to do is wait until you incorporate changes suggested in the Discourse topic, then start right away.

Please release that version, which you expect to be 0.23.0, and then set that as the version submitted above.

jagerber48 commented 1 year ago

Thanks. That makes the most sense to me also. That's what I'll do.

NickleDave commented 1 year ago

Great, thank you @jagerber48 -- once you reply back that you've released that version, we'll go ahead with the review

jagerber48 commented 1 year ago

Ok. I've implemented a version of the changes discussed in the discourse topic. I'll post details in the topic since progress has been made but there may still be room for improvement. sciform is on version 0.24.0 for now and I think review can move forward. I have a few small-scale (I think) questions that I'm curious to have addressed/considered during the review. I'll compile and post them here.

I may continue to make small code/docs changes and releases during the review but I'm happy to pin the review to version 0.24.0. Alternatively, if it would be better for me to use some other git workflow during the review I'm happy to do that. One of my questions is generally what my git workflow should be for this package. Right now I just have a main branch and I make feature branches that have one main feature but sometimes a small feature or two sneaks in. I then just PR merge that feature branch into the main branch. I don't really have much experience releasing versions of code so I'm just going off some stuff I've learned. Not experience with release branches, pre-releases, etc. (Happy to discuss in more detail later, just want to answer now if I should do anything differently immediately during the review time).

jagerber48 commented 1 year ago

I pushed version 0.25.0 which follows a more updated version of what was discussed in the thread for reducing code repetition. If the review has already started or anything happy to keep the review looking at 0.24.0.

Some general questions I have that may be in-scope for the review:

Suggested git branching model, or modifications to my existing model. Right now I branch off of main for a feature branch, I work on that, then I PR it into main using github. However, I often sneak 1 or 2 additional features in addition to the main feature into these feature branches. Is this a bad practice? What should I do instead?
Should I include changelog even for minor version bumps?
Right now when I merge a PR I have to manually tag the branch on my local repo, push up the tag, then build the code into a sdist/whl and then upload to pypi. I guess I should automate these steps? Are there example for good ways to do this?
Right now on my github PRs I have github actions to run tests and linting on different python versions. Is there a way for me to run these tests all on my local system with github? Or is this type of automation typically done with some third party build/automation tool?

Then I'll also mention that I've found making the documentation good and consistent has been my biggest challenge/time spent on this project. I continue to have ideas for documentation improvements but documentation has been a moving target as the code has continued to change. Just to say: I expect errors in the documentation and places I can make improvements.

jagerber48 commented 1 year ago

I don't have any non-trivial breaking changes in mind for the package at this time (though stuff always seems to come up). The only API changes I have in mind are possible name changes to options or classes. There are a few I'm not 100% happy with so I may poll pyopensci or e.g. code review stack exchange to try to get some broader set of opinions on some naming choices.

Just two I've written down:

precision is overloaded to mean (1) the number of digits after the decimal point to round to in RoundMode.PREC mode and (2) the number of sig figs to round to in RoundMode.SIG_FIG. It isn't really the best name for either of these. Maybe something like num_round_digits?
SciNum and SciNumUnc used to be sfloat and vufloat before the code was refactored to use Decimal over float. I'm not really in deep love with any of these names.

Those are just two I had written down, but pretty much any name in the API is fair game for naming improvement suggestions.

NickleDave commented 1 year ago

Thank you for letting us know @jagerber48.
We are wrapping up a couple of other reviews but we expect to have an editor free to start this one shortly.
They will make sure reviewers take your requests for feedback into consideration.

jagerber48 commented 1 year ago

sciform is now on version 0.26.2. Version 0.26.0 introduces a number of trivial name changes to various options. 0.26.1 introduced more unit tests for better coverage (I learned how to measure unit test coverage) and helped me discover one bug and another bug/unexpected behavior. The first bug is fixed in 0.26.2. The second bug requires some convention decisions to be made about the format specification mini language. See https://github.com/jagerber48/sciform/issues/29.

Batalex commented 1 year ago

Hi @jagerber48,

As we discussed, I will lead the review for sciform. I am looking for reviewers and will let you know when I am done.

jagerber48 commented 1 year ago

Here's an update on the current status of the package and some questions. Since making this submission I modified the code quite a bit moving up to version 0.29.0. Since then I've taken a break from updating sciform because I think all of the core functionality I envisioned is in place. The tasks I see (I think it's likely other will arise during the review) in front of me are

Trying to fully stabilize the API. I think it's close but there are few "opinion-based" questions I have about e.g. the naming of some options and similar scope. See below.
Adding non-core features. Importantly there is one important feature I want to add which is pre-defined FormatOptions and corresponding Formatter classes. Hopefully these can exist and make it so that users rarely if ever have to actually interface with configuring their own options and can just select a pre-configured formatter and use that. Or maybe set pre-configured global default options and use a generic formatter.

For adding the non-core features I plan to wait until after the pyopensci review and possibly until after I release version 1.0.0 with a more stable API (which will also happen after the pyopensci review). For stabilizing the API since I feel a lot of the question I have are opinion-based I'm excited to get opinions from other people than me! I'd hope that some of these questions can be debated or discussed during the review if they're in scope. Here are the current questions I have listed:

Enums. Right now when configuring a FormatOptions object the user has to import various enum objects like the ExpMode or ExpFormat enum to configure certain settings. Is it too much of a pain to have to import all of these objects to choose settings? What would be a better alternative? (I know bare strings could be used but these are error-prone...). Also in some cases a sentinel like AutoDigits or AutoExpVal needs to be imported. (@Batalex already suggested using Literal. There's also the possibility of a StrEnum which might support passing an enum instance or the string representation..)
Formatter and FormatOptions. Right now creating a Formatter object is a two step process which first involves creating FormatOptions object then passing it to the Formatter constructor. This can be done in one line but takes up a lot of characters. The FormatOptions can't be dispensed with because it is also used to conveniently configure globa[l options and new sets of format options. I considered allowing passing the kwargs to form FormatOptions directly to the Formatter constructor, but there were various downsides to that approach because of just how many options there are. See this discussion for further details. Is this approach to burdensome? How could it be improved (taking into account all of the points made in the linked discussion).
By default value/uncertainty pairs are formatted as 123 +/- 42. But using the unicode_pm=True option it is possible to format this as 123 ± 42. Should the unicode_pm option be True by default? Or even further, should unicode_pm=True just be the only behavior and the option is dispensed with entirely??
I'm not in love with some of the names chosen for various objects and options. Looking for suggestions for improvements on naming ANYWHERE throughout the package (even possibly the package name). But a few that I've singled out are:
- Is top_dig_place an ok name?
- Is superscript_exp an ok name? Maybe just superscript?
- Is bracket_unc an ok name?
- Is SciNum an ok name?
- Is SciNumUnc an ok name?
Originally I wasn't planning support for binary formatting modes but I was inspired to do so by the prefixed package. Right now binary formatting modes are accessed via dedicated exponent modes. I wonder if the two binary modes should instead be accessed via a separate option indicating the exponent base (10 or 2 at the moment). I'm also curious to query interest levels in the binary formatting. I personally have no interest in the binary formatting since my work is mostly in physical sciences and I use decimals and SI. But I imagine people more in computer sciences may have more need for binary formatting.
Right now there's an issue with the sciform FSML where the separator options and rounding mode option collide. See the github issue. This issue came about from when i was trying to include all options in the FSML. But since then I've built out FormatOptions much more, setup global configuration options, and abandoned the idea that all options should be configurable via the mini language. I'm leaning towards just dropping the ability to configure separators from the FSML. Objections to this approach?

These are some of the main existing api features that I have questions about. I.e. making decisions on the above will constitute breaking changes, so I want to address these before 1.0.0. I am ALSO open to feature requests but will be implementing those with lower priority than stabilizing the API.

Thank you so much for taking the time to read this and also thank you very much for any feedback you are able to provide! I'm also curious to hear suggestions for other communities where I could seek feedback.

Batalex commented 1 year ago

:wave: Hi @isabelizimm and @machow! Thank you for volunteering to review for pyOpenSci. I am truly excited by the team we have on this review.

@jagerber48, Isabel and Michael are involved with quarto, and I believe they have some pretty good opinions on how to format numbers for publishing. I am looking forward to seeing their review!

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

[x] reviewer 1 survey completed.
[ ] reviewer 2 survey completed.

The following resources will help you complete your review:

Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: 27 October 2023.

Reviewers: @isabelizimm, @machow Due date: 2023/10/27

jagerber48 commented 1 year ago

Started running the code on real applications myself some more and quickly found some important bugs. Just made a release of version 0.29.1 that fixes these bugs. https://github.com/jagerber48/sciform/releases/tag/0.29.1

isabelizimm commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[X] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[X] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[X] Vignette(s) demonstrating major functionality that runs successfully locally.

Has examples page

[X] Function Documentation: for all user-facing functions.
[ ] Examples for all user-facing functions.
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[X] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[X] Package has a README.md file in the root directory.

The README should include, from top to bottom:

[X] The package name
[X] Badges for:
- [X] Continuous integration and test coverage,
- [X] Docs building (if you have a documentation website),
- [X] A repostatus.org badge,
- [X] Python versions supported,
- [X] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

[X] Short description of package goals.
[X] Package installation instructions
[X] Any additional setup required to use the package (authentication tokens, etc.) (N/A for sciform)
[X] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [X] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[X] Link to your documentation website.
[X] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

[X] Package documentation is clear and easy to find and use
[X] The need for the package is clear
[ ] All functions have documentation and associated examples for use
[X] The package is easy to install

Functionality

[X] Installation: Installation succeeds as documented.
[X] Functionality: Any functional claims of the software been confirmed.
[X] Performance: Any performance claims of the software been confirmed. No specific performance claims have been made.
[X] Automated tests: on GitHub Actions
- [X] All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
- [X] Tests cover essential functions of the package and a reasonable range of inputs and conditions. Has 100% test coverage! :tada:
[X] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[X] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [X] Package supports modern versions of Python and not End of life versions.
- [ ] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4-5

Review Comments

Overall, this package is super nifty and clearly offers a huge quality of life increase in scientific publishing. The documentation, as a whole, is pretty comprehensive with lots of examples and a mix of API and narrative style docs. Thank you for a great addition to pyOpenSci!

In documentation: some functions do not have parameters documented. eg, set_global_defaults_rendered has the parameter rendered_format_options but it is not shown in the API docs
Not all the functions have examples documented in the API docs. Some might be more useful than others, but at least maybe add one for merging FormatOptions?
For usability of the FSML, I think a few annotated examples would be useful, particularly one mapping each format specification to it's name/use case. Right now, it is a little difficult to mentally parse which pieces of the example print(f'{SciNum(123456.654321):_,_.4}') cause the output of 123_456,654_3.
It is recommended to have a CODE_OF_CONDUCT.md file. You can grab a CoC you like from another project since they tend to be pretty standard.
It is recommended to have a CONTRIBUTING.md file.

Response to a few maintainer questions

I've answered a handful of your questions that haven't had a response yet + I have a response to, but happy to have discussion on any other points you're interested in chatting about! 😄

By default value/uncertainty pairs are formatted as 123 +/- 42. But using the unicode_pm=True option it is possible to format this as 123 ± 42. Should the unicode_pm option be True by default? Or even further, should unicode_pm=True just be the only behavior and the option is dispensed with entirely??

I think setting unicode_pm=True as a default is the right move! I think it is a delight sciform offers to users to not have to type ± :smile: But, there are a lot of configuration parameters and it could be missed, so this makes the out-of-the-box experience feel a little extra polished.

Is top_dig_place an ok name?

I wonder if you could somehow work padding into the name? My thought process behind that: top_dig_place looks to be used in the general purpose Formatter where arguments have a variety of use cases (padding, decimal places, separators), so it's use for padding might not be apparent. (These are not very good suggestions, but something like pad_to or pad_to_exp?)

Is superscript_exp an ok name? Maybe just superscript?

I have a slight preference to superscript.

Is bracket_unc an ok name?

For bracket_unc and SciNumUnc alike, my brain did not go to uncertainty for unc. I do realize that a other arguments build off this name (bracket_unc_remove_seps), so brevity is important. (bracket_uncertain + bracket_uncertain_seps maybe? you would have to flip the boolean for the separators then since you drop the negative)

Is SciNum an ok name?

I like SciNum.

Is SciNumUnc an ok name?

It was not immediately apparent to me that SciNumUnc was for uncertainty, but once I made that connection, it made a lot of sense. SciNumUncertain is a bit longer, but more clear. I do think Uncertainty alone isn't verbose enough since you still need your scientific number. FWIW I ended up typing SciNumU and then auto-completing the rest, so I'm not too afraid of longer names.

I wonder if the two binary modes should instead be accessed via a separate option indicating the exponent base (10 or 2 at the moment). I'm also curious to query interest levels in the binary formatting. I personally have no interest in the binary formatting since my work is mostly in physical sciences and I use decimals and SI.

For binary formatting, maybe open an issue/discussion about it and ask people to :thumbsup: if they would like to see it implemented? That way you're not building things you don't have interest in. However, if you already have the plumbing in place, and it's just a matter of exposing an argument, and you feel compelled to do so, then that seems like a good interrim move!

Should I include changelog even for minor version bumps?

I think yes! For every release, it is super useful to see the changes. If it feels like too much effort to go back and write everything manually, even something small like using the Generate Release Notes button when creating a new release is sufficient. :smile:

Right now on my github PRs I have github actions to run tests and linting on different python versions. Is there a way for me to run these tests all on my local system with github? Or is this type of automation typically done with some third party build/automation tool?

Something you can do locally is install and use pre-commit. You can have pre-commit hooks do a variety of things, but some common use cases are running black and flake. You can also configure testing to run on pre-commit, but this is less common to do since sometimes tests can take a while to run.

Optional nits:

To make documentation even easier to find, you can add as a url in the settings to show up here ⬇️ (you should see a ⚙️)
It would be useful to give install instructions for a development version of the package (eg, pip install git+https://github.com/jagerber48/sciform for the latest changes)
For more readable code, you might want to add black or some other formatting standard. To help automate this, running the linter can be added in a pre-commit hook or in CI.

Batalex commented 1 year ago

Thank you kindly for this thorough review!

machow commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
[] Installation instructions: for the development version of the package and any non-standard dependencies in README.
[x] Vignette(s) demonstrating major functionality that runs successfully locally.
[X] Function Documentation: for all user-facing functions.
[X] Examples for all user-facing functions.
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

[x] Package has a README.md file in the root directory.

📝 NOTE that the package uses a README.rst, but maybe this meets the requirement?

The README should include, from top to bottom:

[x] The package name
[x] Badges for:
- [x] Continuous integration and test coverage,
- [x] Docs building (if you have a documentation website),
- [x] A repostatus.org badge,
- [x] Python versions supported,
- [x] Current package version (on PyPI / Conda).

[x] Short description of package goals.
[x] Package installation instructions
[x] Any additional setup required to use the package (authentication tokens, etc.)
[x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
[x] Link to your documentation website.
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information

Usability

[x] Package documentation is clear and easy to find and use.
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[x] The package is easy to install

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests:
- [x] All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
- [x] Tests cover essential functions of the package and a reasonable range of inputs and conditions.
[x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
[x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
- [x] Package supports modern versions of Python and not End of life versions.
- [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

[x] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

[x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2.5

Review Comments

Suggested git branching model, or modifications to my existing model. Right now I branch off of main for a feature branch, I work on that, then I PR it into main using github. However, I often sneak 1 or 2 additional features in addition to the main feature into these feature branches. Is this a bad practice? What should I do instead?

I think that--as the package maintainer--if bundling multiple features into a PR is working for you, then it seems okay. It seems more important to limit the scope of PRs to other people's repositories, so that 1 feature doesn't get held up waiting for feedback / discussion on others.

Right now when I merge a PR I have to manually tag the branch on my local repo, push up the tag, then build the code into a sdist/whl and then upload to pypi. I guess I should automate these steps? Are there example for good ways to do this?

You can use the github action https://github.com/pypa/gh-action-pypi-publish to publish to PyPI whenever you make a github release. Here's an example in a tool I maintain called quartodoc, in case it's useful!

Right now on my github PRs I have github actions to run tests and linting on different python versions. Is there a way for me to run these tests all on my local system with github? Or is this type of automation typically done with some third party build/automation tool?

You can use tox, though I just push to github and let the github actions run the tests. There's also act, for running github actions locally (though I've never used it).

SciNum and SciNumUnc used to be sfloat and vufloat before the code was refactored to use Decimal over float. I'm not really in deep love with any of these names.

I have no good opinions on the exact names, but like that they use camel case, so feel more class-y.

Adding non-core features. Importantly there is one important feature I want to add which is pre-defined FormatOptions and corresponding Formatter classes.

This seems super handy!

Enums. Right now when configuring a FormatOptions object the user has to import various enum objects like the ExpMode or ExpFormat enum to configure certain settings. Is it too much of a pain to have to import all of these objects to choose settings? What would be a better alternative?

Tools like pydantic can take Literals or Enums, and give nice validation / feedback if an invalid value is passed. If pydantic feels too heavy duty, I like the suggestion of FormatOptions using Literal, and doing some validation. At a certain point, many options mapping to their own Enums starts to feel a bit cumbersome to type.

Formatter and FormatOptions. Right now creating a Formatter object is a two step process which first involves creating FormatOptions object then passing it to the Formatter constructor.

If Formatter can only be initialized using a single FormatOptions instance--and FormatOptions is all data no methods--then, it seems like they can be a single class. (It seems like the combined class of these two things could still be used to configure the global options).

By default value/uncertainty pairs are formatted as 123 +/- 42 ... Should the unicode_pm option be True by default?

I'm not in love with some of the names chosen for various objects and options.

Originally I wasn't planning support for binary formatting modes but

Not helpful opinions here :/. I think that the package is so useful for this kind of problem that I'd be motivated as a user to figure out what all the names meant, though!

Right now there's an issue with the sciform FSML where the separator options... I'm leaning towards just dropping the ability to configure separators from the FSML. Objections to this approach?

The FSML seems really neat, but reading through the documentation / trying the package, it feels easier to specify FormatOptions (and with literals or some approach that makes seeing options in autocomplete / code completion in eg VS Code would be very fast to initialize!). I always have to look up the options in the built-in string formatting mini-language, but initializing a class gives a lot of nice hinting / auto complete options for helping folks.

(I could see the FSML being nice for quick, common formatting though!)

Overall, it seems like a really helpful library!

Batalex commented 1 year ago

Thanks a bunch, @isabelizimm and @machow, for the reviews!

@jagerber48 I also took a few notes (without the suitable format, editor's privilege 🐈‍⬛ )

Project documentation

README

Instructions are clear, and the statement of need as well. Lacking a few elements from our guide, nothing too serious. Python version compatibility needs to be clarified.

Most of those issues have been solved in the meantime.

Contributing

Missing as of now.

License

Ok.

Changelog

Ok.

Installation

Installation resolves correctly.

pyproject.toml

The classifiers are lacking (python version). Nice setup for the dynamic version & readme.

More classifiers in recent releases.

General remarks and QoC

I would advise using a code formatter, such as black.
I would also strongly advise sorting imports using isort or ruff (❤️ ). (Maybe isort since the linter used is flake8, but I sure won't stop you from migrating to Ruff)
Is there any reason for the comments to start with #:?
A few type hints are wrong; a parameter cannot take a default value of None if its type does not allow it.
There are a lot of exceptions raised throughout the code base. We could simplify it by removing those checks in private functions.

`modes.py`

FillMode: the exception raised is unneeded. Since enums can only take a finite set of values, we cannot have anything other than space or zero.
GroupingSeparator: same as above.

I am on the fence concerning using Literal type hints with the values.

`format_utils`

Maybe use a type alias for Number
A few type hints are missing (get_top_digit)
get_mantissa_exp_base seems a little too complex. Maybe it can be simplified.
- get_round_digit. The two innermost conditions lead to the same result.

API design

I want to propose something regarding the use of the sciform. Currently, the return types for the format functions are strings, which are acceptable for the project's current goal.
I would like to propose making a FormattedValue class inheriting from str. The reason is that we could then include methods such as _repr_html_ to use html syntax in notebooks, or other mime types (LaTeX export to use with quarto?).

jagerber48 commented 1 year ago

@isabelizimm @machow @Batalex thank you so much for the time you've taken to review this package and all the great feedback you've given me! I'll spend some time looking at this feedback and implementing the changes. Some of the changes seem easy, but some of them may require structural considerations. My first step is going to be to try to list which changes I plan to implement and how. I'll likely post that here before I get to work on them.

@Batalex how does the review work from this point forward? Are there any timelines or anything I should be aware of?

Batalex commented 1 year ago

We are now entering a phase of "back and forth", where you would make changes according to the feedback the reviewers offered, and then they would check the items on their list. Just like your typical PR, only a bit longer :D

Considering the scope of the changes proposed, I think we can work on a similar timeline as for the review: a few weeks to implement changes and a few weeks to validate them, and so on.

If you disagree with something in the reviews, that is okay. We just need to get the reviewer's approval at the end, so communication is key to resolving those issues.

machow commented 1 year ago

I've been chewing on the question of SciformOptions and Enums more. In case it's helpful, here's a 1-minute screencast showing some of the nice autocomplete Literals can provide!

https://www.loom.com/share/3ba1eadec7e84689b20d39406c27fa1e

jagerber48 commented 1 year ago

@isabelizimm Thanks so much for your thorough feedback! Responding here and using this as a todo list.

[x] I see that set_global_defaults is not documented properly in the API docs. I wonder why that is? Those are being auto generated by sphinx. I will look into it.
[x] I can try to add examples in the API docs for FormatOptions and the FormatOptions.merge() method. I haven't put much effort into the API docs yet, I've focused more on some of the other docs. This is a good suggestion to take a closer look at the API docs.
[x] You mention annotated examples about the FSML. Yes, I agree that would be very helpful. _,_.4 is a super complicated formatting specification. I have a number of thoughts here. The first is a comment: This package started only as a format specification mini language to extend the python built in FSML, motivated by the uncertainties package and this python discourse discussion. But then a suggestion came up to use a class or functional approach to doing the formatting. This seemed interesting and lifted some of the shackles of trying to work within the constraints of an FSML. The class-based approach could handle many more options and complicated option combinations. The FSML grew to reflect some of this complexity but not all of it. Looking back I have two ideas on this. One is to strip down the sciform FSML, specifically excluding the separator control from the FSML. The other is to just remove the FSML completely. It seems just as easy to pass a number into a Formatter as it is to convert a number to SciNum or SciNumUnc and then use a format string. There are some trade offs that I think I'll discuss below. But, I agree that some annotated FSML examples would be very helpful.
[x] I can add a CODE_OF_CONDUCT file. Thank you for the links!
[x] I can add a CONTRIBUTING file! Again thank you for the link!
[x] Thanks for the input that unicode_pm=True can be default. I agree that unicode_pm=True seems fine and won't cause people issues. I wonder if I should go even further and remove it as an option and only bring it back if I hear a complaint that someone can't display unicode characters and wants an options to display it as +/-?
[x] top_dig_place: Yes, I agree with you that some name about padding would be much clearer. Not sure exactly what it should be. In many words: This parameter specifies the digits place (left of the decimal point) to which either empty spaces or zeros should be padded to a number. pad_digit, pad_digit_place, left_pad_digit.
[x] Thanks for the preference on superscript
[x] Ok regarding the "unc" options. this issue has made me realize there are issues with trying to use the same set of FormatOptions for both single values and for value/uncertainty pairs. So I'm considering something like having a separate data class for value/uncertainty pair formatting options and this may open the door to better named options. That aside, I could see renaming bracket_unc to bracket_uncertainty. Not sure what to do about bracket_unc_remove_seps, I find that to just be a terrible name.
[x] Thanks for the feedback on SciNum.
[x] Thanks also for the feedback here. One idea would be SciNumErr, but I think, scientifically/statistically, we are really talking about uncertainty and not error. Maybe I can make it more obvious in the documentation that SciNumUnc is modelling a SCIentific NUMber along with UNCertainty.
[x] Thanks for the tip to make an issue/discussion on binary formatting. I think that's exactly what I'll do.
[x] Ok, I think I'll keep doing a changelog for all version bumps. I like recording the changes in the changelog as well. I've found it to be a useful history for myself.
[x] Ok, I've started dabbling with pre-commit. It does look like I could use this to run tests and to formatting.
[x] I'll add the documentation url on github.
[x] I can add development install instructions. That may go well along with the contributing instructions.
[x] I'll look into black. I did use flake8 for the first time on this project. I'm personally not yet comfortable with a formatter actually making changes to my code, but I do like following code styles. I use pycharm and follow its style guidelines/warnings, but I did find that some stuff that got past pycharm didn't get past flake8. And it sounds like some stuff that got past pycharm and flake8 doesn't get past black?

jagerber48 commented 1 year ago

@machow here's my response + todo list for your review, thanks again for taking the time to do this review!

[x] Ok thanks for the feedback on gigt branching
[x] I think since asking these questions I have set up some CI around this... Let me recall.. Yes, I overhauled my versioning and it helped with this issue. So now I use setuptools_scm to help manage the package version. There is no longer any instances of the package version appearing in the source code (except for in the changelog, but that is handled special, see below). I make PRs into the main branch and update the CHANGELOG.rst in an unreleased changes section (so I don't yet need to commit to a version number). Then once enough changes are collected I decide (in my brain) to make a new release. I make a PR whose only main job is to set the version appropriately in the changelog, though other cleanup could in principle happen in that PR. Then when that changelog version bump PR is merged in a make a github release with the same version number as was put in the changelog. This creates a tag which is then picked up by setuptools_scm to control the version number. There is a github action that publishes to pypi when a release like this happens (during the built setuptool_scm picks up the version from the tag) and readthedocs also picks up on the new version tag and automatically builds a new version of the docs.
[x] I'll checkout tox and act, thanks for the recs!
[x] thanks for feedback on names
[x] Enums and Literal, I saw you posted a video about this, I will check that out soon and give more of a response. Thank you for helping me think about this, it's been giving me trouble and I think it is very important for usability. Yes, it feels like a burden having the user import like 2-4 different Enums for standard use cases. I asked about stuff like this on StackExchange but kept getting answers along the lines of "it's not a burden on the user to have to import Enums to configure options", but I think they may have been imagining importing a single enum, and not like 4.
[x] Formatter and FormatOptions. Yeah, I appreciate the feedback that Formatter and FormatOptions can be one class. Or at least one class from the user's perspective. Originally it was combined and I split it out. But I think the split didn't happen exactly in the right way. I'll consider how I can merge this functionality. I think it would be a big improvement in usability, but may be a moderate to major change.
[x] Thanks for more thoughts on names!
[x] Yeah thanks for the feedback on FSML. See my comment above to isabelizimm. I agree that the access via FormatOptions seems to be easier and more user-friendly. Considering at least slimming down the FSML, or even possibly dropping it entirely.

Thanks again for the great feedback!

jagerber48 commented 1 year ago

@Batalex ok, and my response to your comments:

[x] will add contributing information
[x] will add python version compatibility. Note: right now I think python 3.8 is not supported because of some typing feature I use. Debating if I want to make it compatible with 3.8 or not. Seems typing feature are a major culprit for breaking compatibility with even very recent versions of python.
[x] I'll look into black
[x] I'll look into import sorting
[x] Those comments that start #: have something to do with sphinx adding comments in as documentation for Enum member or something. So yes, there is a reason, I don't remember the exact details off hand.
[x] I will look into incorrect type hints. I think I got some mixed signals. I saw something that I interpreted to mean that a default value of None would coerce the type to be unioned with None, but then I saw that that was maybe an old behavior that was removed. I did have one round of adding Optional to lots of types like this, but may not have hit them all.
[x] I'm interested in what you say about removing checks in private functions. Maybe you or I could find an example and we could discuss. I asked specifically about this kind of thing on python discourse. One of these exact exceptions actually ended up catching this bug.
[x] FillMode and GroupingSeparator, maybe I already changed this? I don't see exceptions in those classes in my current source code.
[x] type alias for Number is probably a good idea
[x] I'll check type hints, thanks!
[x] I can look at simplifying get_mantissa_exp_base and get_round_digit. These functions are part of the meat of the package that is doing the actual formatting algorithm. I was hoping to improve upon existing codebases approaches to formatting which I found to be pretty complex and hard to follow, but in the end, I think sciform's functions can also be complex and hard to follow. But I think you're right that improvements can probably be made.
[ ] API design suggestion for having a FormattedValue that could be converted to html or latex. Thanks for this suggestion. I only partially understand the use case and the implications but it seems like it could be very useful for broadening the scope of sciform beyond jupyter notebooks and terminal printouts. Would it make sense to open a discussion idea topic on the sciform page with this idea and we could discuss in more detail there? I'm interested in this, but this sounds like something further out that won't get completed within the timeline of this review (or in other words, I'd like to prioritize the other items before looking into this possible new feature).

jagerber48 commented 1 year ago

I've been chewing on the question of SciformOptions and Enums more. In case it's helpful, here's a 1-minute screencast showing some of the nice autocomplete Literals can provide!

https://www.loom.com/share/3ba1eadec7e84689b20d39406c27fa1e

@machow thank you so much for putting together this screencast. Ok, that was a pretty compelling comparison between Enum and Literal. I've been searching online for comparisons just like this between Literal and Enum in python but haven't found much yet. I guess this is because Literal is somewhat new. But it seems like, at least for this use case, Literal might just be better than Enum. The Enum feel like some sort of "proper" coding, but I can just imagine myself and users being like, why can't I just pass in a string? The Literal input fixes all of the shortcomings of using "magic" strings as inputs while still being able to accept string inputs.

I will definitely at least mock up a branch of sciform that uses literals instead of enums and see how it goes.

Batalex commented 1 year ago

Thank you for your very neat checklists, which will significantly reduce the mental charge of everyone here. I see that you use a PR-based workflow even though you are the only contributor to the project. That's great! Feel free to request reviews directly on the PRs to avoid cluttering this issue.

In addition to Michael's loom on the subjet of Enum / Literal, since you are targeting Python 3.8+, it might be worth using Literal instead of Enum, because you have access to the get_args function in the typing module. Here is an example of how you could use it.

from typing import get_args, Literal, TypeAlias

DecimalSep: TypeAlias = Literal[".", ","]

def some_function(sep: DecimalSep):
    if sep not in get_args(DecimalSep):
        raise ValueError()

some_function(sep=";")  # raise
some_function(sep=",")  # OK

I think this approach is interesting, because we both have the ease of use of the string argument instead of a verbose Enum, as well as a runtime check.

jagerber48 commented 1 year ago

@machow, @Batalex to be clear, from a user API perspective, the shift to Literal from Enum will involve ONLY accepting Literal inputs to configure options and no longer accepting Enums, correct? It would be a bad choice to accept either a Literal or the corresponding Enum from the user?

Batalex commented 1 year ago

Per PEP 506, there is no way to build a Literal from an expression. Accepting both a Literal and a Enum would require that you duplicate the values throughout the code base.

from enum import Enum
from typing import get_args, Literal, TypeAlias

class DecimalEnum(str, Enum):
    period = "."
    comma = ","

DecimalLit = Literal[".", ","]
DecimalSep: TypeAlias = DecimalLit | DecimalEnum

def some_function(sep: DecimalSep):
    if sep not in get_args(DecimalLit):
        raise ValueError()

some_function(sep=",")  # OK
some_function(sep=DecimalEnum.comma)  # OK

That is a valid possibility, I am just concerned by the confusion of the three symbols we need

jagerber48 commented 1 year ago

Working on code of conduct and contributing guidelines: https://github.com/jagerber48/sciform/pull/74. I was able to easily copy & paste a code of conduct that seems good. I don't see an obvious contributing guidelines for a simple small open source project. Looks like I may need to put more thought in and craft something together from other examples that are around.

jagerber48 commented 1 year ago

@Batalex re: your last comment about an Enum + Literal approach. Thank you for putting together that example. So yes, the string representations would need to be duplicated in two spots. I also have a feeling the context type hint provided by IDE code completion would be long and less helpful in this case. It feels like Enum was a nice controlled strategy for this use case prior to Literal, but I think it's just going to easier for users if they can pass in strings. I's not like its unusual at all in scientific programming to use strings to select function options. scipy and numpy do that sort of thing all over the place. I think it's very rare that I've used a package that actually uses Enum for options. Just some people on stack exchange say it's a best practice. Now with Literal, it seems a lot of the downsides of using strings as inputs have gone away. Specifically, while a format string like engineering (for engineering exponent mode) would have been a "magic" string in the past, using Literal['engineering', ...] makes it no longer as much of a magic string. I will hash out an example that only uses Literal. I'll make a decision while I'm coding whether I want to convert the Literal string user input into an Enum for use by the backend code.

jagerber48 commented 1 year ago

I'm working on contributing guidelines for sciform. I haven't made many if any PRs on other people's repos on github. Trying to figure out how PRs from other people would work on sciform. My basic understanding is that someone would make a fork of sciform that lives in their user space, they would branch off main and make their changes. Then I think they PR their forked version into the main version in my user space. Questions:

I can't tell if this would work for sciform for any user or if I would have to grant specific users the ability to make a PR from a fork.
How would this process look in the git history for sciform? If the PR is approved/merged in would it just appear as a new commit on the main branch? Since the contributors feature branch lives in their space instead of in the main sciform repo in my user space?
Do I need to describe this process in the contributing guidelines if I want to welcome PRs from other devs?

NickleDave commented 1 year ago

I can't tell if this would work for sciform for any user or if I would have to grant specific users the ability to make a PR from a fork.

No, you don't need to grant anyone permissions for them to be able to fork and make a PR. If only you could 😇

How would this process look in the git history for sciform? If the PR is approved/merged in would it just appear as a new commit on the main branch? Since the contributors feature branch lives in their space instead of in the main sciform repo in my user space?

Basically: yes, a new commit on main if you "squash and merge", or all the commits if you merge the entire branch More detail: it depends on how you merge https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/about-merge-methods-on-github I recommend to let contributors do whatever in a branch and then just squash and merge as one commit: https://jacobtomlinson.dev/posts/2022/dont-prematurely-squash/rebase-and-force-push-your-prs/

Do I need to describe this process in the contributing guidelines if I want to welcome PRs from other devs?

Ideally although you can probably point to one of many existing resources and say "we do this", e.g. "we use a standard PR workflow similar to NumPy and scikit-learn"

You might find this example from the PyGMT docs helpful: https://github.com/GenericMappingTools/pygmt/blob/main/CONTRIBUTING.md Note how they split out a separate guide that lives in the rendered docs: https://github.com/GenericMappingTools/pygmt/blob/main/doc/contributing.md

jagerber48 commented 1 year ago

I want to give an update. I've checked off a couple of the minor changes, but I'm now delving into an experiment on a major refactor. This refactor involves two thing that I think are major improvements to the UI, but, for now, seem to make developing on the back end a little more of a headache. That said, my goal is to release a version 1.0 of this project and my main criteria for doing that is I want to have a nice user interface. "easiness of maintainability" is not a must-have for releasing 1.0, so I've made the decision, at the moment, to explore what things would look like if I make the UI as nice as possible and then just work around that on the back end. The changes are:

Move from using Enums for users to specify options to using Literals for users to specify options. My sense is that people naively using Python would just naturally use literal strings to specify different options, especially since this is a pattern in packages like numpy, scipy, pandas, etc. But the "proper" way to do this (if you look it up) is to use Enums. This fixes some maintainability issues around using magic strings but it really an annoyance on the user to have to import more things into their name space and probably to even have to learn what Enums are and how to use them if they don't already know. Literal fixes a lot of the shortcomings of both the simple magic string strategy and the Enum strategy. I think it will just be a lot more straightforward for users. There are downsides, however. My IDE, Pycharm, doesn't have autocompletions for Literal yet but there is an open issue about it, I'm optimistic it will come at some point. Also, Literal doesn't pass type checking if you first do something like exp_mode = 'engineering' and then pass exp_mode as a kwarg into the formatter constructor unless you explicitly specify exp_mode as a Literal in the right way.
The second major change is I am eliminating FormatOptions from the user interface. Instead users just pass kwargs directly into either the Formatter constructor or the various global options settings functions. I introduced the FormatOptions approach previously because it meant the full list of options could be repeated many fewer times whiles till keeping everything nicely statically typed. See the conversation here: https://pyopensci.discourse.group/t/how-to-avoid-repeating-a-long-list-of-keyword-options-throughout-a-package/331/15. However, the FormatOptions object really doesn't provide any value to the user interface. In fact, it just further pollutes the user's namespace. So for now I've reverted to a proliferation of kwargs all over the source code. However, when PEP 692 (kwargs unpacking type annotations) is supported on pycharm (it's probably already supported by other typing tools) I can move to just using **kwargs everywhere which will compress all those repetitions of all the options while still having good static type annotations.

So basically, I had made a few decisions for better static typing and maintainability on the back end, but they were at the cost to the user interface. I've now decided the user interface is more important and I'm refactoring the code to reflect this change. Once these changes are more finalized I'll link to a PR in this thread. I would appreciate getting feedbacks from folks here on (1) the changes in user-friendliness and (2) the changes to the source code.

jagerber48 commented 1 year ago

https://github.com/jagerber48/sciform/pull/82 here's the PR. As of right now the source code has been re-worked and the tests in the tests directory have been re-written and are passing on my system. The main task ahead is rewriting a bit of documentation and the doctests.

jagerber48 commented 1 year ago

Ok I would say the PR is mostly complete now. All tests are passing and documentation and changelog have been updated to reflect (1) the removal of FormatOptions from the public interface along with (2) the replacement of Enum options by string Literal options. This is a big enough change that I would appreciate another set of eyes on it before merging into main. Based on the comments so far I think this is the most significant change I'll be making as part of this review. I would appreciate if anyone could let me know if they could or wouldn't be able to have a look at this in, say, the next 2-3 weeks?

https://github.com/jagerber48/sciform/pull/82

jagerber48 commented 1 year ago

Ok, I released version 0.30.0 which fixes implements the major interface changes discussed above (then had to immediately release 0.30.1 because I messed up the changelog).

I'm now continuing to work through the other topics on the checklists above.

@isabelizimm @Batalex, you both suggested I use black for automated formatting. I'm curious to learn more about this. I currently use Pycharm as my IDE. Pycharm has pretty powerful code inspections that I work to abide by. Also, for this project, I started using flake8 so that I can run some linting/format checking in my CI. I'm curious why you suggest I would want to use black in addition to these two linters.

Actually just played with some sciform code in the black playground. It looks like black is just even more strict about formatting (things like skipping lines, how long chunks of code get broken). These are a few of the things I do still find myself making conscious decisions about, so maybe it could be useful. Still curious for your opinions.

edit: Ok, I've done more research, yes ruff seems really nice, but I'm particular enough about my linting and formatting that I'm going to need to spend some time finding settings I like. I'll probably move sciform to use ruff once I get it figured out.

Batalex commented 1 year ago

I apologize for sounding somewhat pedantic; I just love explaining things in detail.

What is a code formatter, and why do we need it?

Ultimately, the goals of a code formatter fall into two categories: (1) reducing the cognitive charge of reading code and (2) asserting that code diffs add value to the code base.

As you might know, the PEP 8 style guide is prevalent throughout the Python ecosystem. Its specifications are not precise enough, so there are some ambiguities, or some rules are not as relevant as others. Hence, multiple implementations.
Nonetheless, using a code format based on PEP8 lowers the entry barrier for contributors since the code is easier to read.

We suggested black because we can address some of the caveats of PyCharm's integrated formatter.

First, make it easier for potential contributors to work on your project. Though PyCharm is quite popular, that means that VSCode, (neo)vim, helix, etc., users cannot easily format their code contributions with the format you use.
The good news is since PyCharm 2023.2, you can use black as the default formatter. Using a CLI-based tool means you can format the code (or check if the code is properly formatted) outside the IDE, i.e., in a GH action workflow. flake8 checks are pretty limited regarding the format. Finally, the black code format aims to produce smaller diffs.

black is pretty opinionated. One of the things that people do not like about black is that it uses double quotes for strings. You probably have to add a few exceptions to your linter configuration so that it plays nicely with black.

jagerber48 commented 1 year ago

@Batalex thanks for the explanation!

As you might know, the PEP 8 style guide is prevalent throughout the Python ecosystem. Its specifications are not precise enough, so there are no ambiguities, or some rules are not as relevant as others.

This seems like the key. It seems like most people agree that we should follow PEP 8, but the point is PEP 8 still leaves a LOT of freedom for how things should be formatted. I guess what you're saying is that linters and formatters impose tighter constraints on what is considered appropriately formatted. I can understand this. I'm in favor of more tightly restricted formatting (In my work, unrelated to sciform, I use Pycharm AND try to follow it inspections which is a way more opiniated strategy then either not following the inspections or however many of my colleagues have vscode setup).

And then yes, third party linter/formatters should be preferred over Pycharm's linter/formatting because (1) anyone has access to them regardless of their development environment and (2) they seem to be better documented than Pycharm's "inspection" properties.

I saw your suggestion to use ruff and I've been looking into it the past day. It seems really nice but since I'm particular about linting/formatting it's going to take me a while to figure out how I want to get it configured. I'll probably do a few passes on the sciform source code trying different settings. For some reason I decided I like single quotes more than double quotes, but given how widespread black is (and ruff follows black formatting), it seems like I should switch over to double quotes. Also, I was able to get a ruff plugin for pycharm so that its inspection correspond to ruff output and so that you can apply ruff fixes and formatting!

Anyways, I could have a really long discussion about opinions about linting and formatting! I think I'll save that away for a discussion on slack or discord and try to post my end-results here.

Batalex commented 11 months ago

Hey @jagerber48, how are we moving forward with this review? Feel free to give us a shout if you need some help / additional information

jagerber48 commented 11 months ago

@Batalex thanks for the ping. So third party linter/formatting is something I've been needing in my life. I'm currently working on a PR getting sciform in line with the ruff formatter: https://github.com/jagerber48/sciform/pull/88. Hopefully I'll have it merged in a few days.

I think with the completion of this PR I will have checked off almost everything on the list except for opinions on naming convention. I do think I still may be seeking stronger opinions on naming conventions so I'm open to stronger opinions there and also suggestions for how to get finalized on the names (one suggestion can be: "don't worry about it so much, just go with the names here"). I'm worried because I'm a bit of a perfectionist and I often do things one way thinking it's good, but then a few months later realize there was a better way. If I'm in a situation with sciform where I feel like I can't change the names because users are relying on the stable api, then I'll be frustrated.

jagerber48 commented 11 months ago

Ok, merged a big commit into main branch (not released yet though) where I lint and reformat using ruff: https://github.com/jagerber48/sciform/pull/88. Most of it I'm happy with. I was a little annoyed to move to double quotes from single quotes since it takes an extra keystroke to type double quotes. Guess I can type single quotes and let the tool format to double later. Ruff supports single quotes for some functionality but not all, and Black has made the opiniated choice (that I disagree with) to use double quotes, so there is just more tooling support for double quotes. Once Ruff gets support for single quotes on more features I might move back to them.

Trying to wrap up some more checkboxes. @Batalex you suggest removing exception raising in private function. Most of the exceptions in sciform arise as else clauses in if... elif... else... exhaustive checks. See this discussion. See this block of code

from enum import Enum

class Options(Enum):
    OPTION_1 = 1
    OPTION_2 = 2

def foo(option: Options):
    if option is Options.OPTION_1:
        res = 1
    elif option is Options.OPTION_2:
        res = 2
    return res

Pycharm gives me a warning on the last line saying Local variable 'res' might be referenced before assignment.. This motivated a lot of the else clauses with exceptions. Do you think this is a bad practice? I should check Ruff's behavior on this. Ruff might be more intelligent about static type checking exhaustiveness checks.

Batalex commented 11 months ago

I am pretty sure that black's double quotes will grow on you! That was the case for pretty much everyone I know who used it. No pressure, though, as you said

Guess I can type single quotes and let the tool format to double later.

I hope that with both black and ruff, you will soon be more productive and confident than ever in your code base.

Unfortunately, in your case, there might not be a perfect answer to fix all the issues with exhaustiveness checks. We are better off forgetting about this comment of mine altogether.

My recommendation would be to use match statements, as they communicate your intent the most clearly.

def foo(option: Options) -> int:
    match option:
        case Options.OPTION_1:
            res = 1
        case Options.OPTION_2:
            res = 2

    return res

However, (1) sciform targets python 3.9+, and (2) type checkers do not enforce match exhaustiveness against match statements as they should. That means that Pycharm's internal linter would still give a warning "referenced before assignment."

Using python 3.11's typing.assert_never would be the second choice, but it is again incompatible with python 3.9.

If I were to nitpick, we could drop those else clauses with an exception in all nonpublic facing functions, as we would already have validated at runtime that option is indeed an Options. But this adds little value to the code base, so I think we should keep things as they are.

jagerber48 commented 11 months ago

Ok, thanks for the response. In that case, the list is indeed getting short. I'm going to collect the remaining items I have

[x] Annotated FSML examples. These would be useful. They'll take some time to get in. I think the separator configurations, and the exponent value forcing settings are probably the most new/confusing.
[x] Rename top_dig_place to something like left_pad_digit or left_pad_to_digit
[x] Rename superscript_exp to superscript
[x] Think about clarity of the unc abbreviation. Consider revising the interface so that options specific to value/uncertainty formatting sit in a separate place than the rest of the options. This might allow for some improved options names
[x] Think about SciNum/SciNumUnc naming.
[x] Consider using precommit hooks for linting/formatting/tests (in lieu of running github actions locally)
[x] Add development install instructions. @isabelizimm, you were the one who suggested I end develop install instructions, curious for your and others' thoughts on the following. I'm curious what is typical here. My first guess would be to recommend devs fork and git clone the repo then pip install -r requirements-dev.txt or something. I know I could configure pyproject.toml to allow python -m pip install sciform[dev] to work. But I think I'd prefer the former approach so they already have it set up as a git repo and are able to push and make pull requests. And then what should dev requirements be? sciform itself has no requirements for installation and use. For simple dev I guess devs would want ruff installed. For editing docs I need a couple sphinx and sphinx utility packages. For the examples I need the heavyweight numpy/scipy/matplotlib requirements. Should I include ALL of these as dev requirements? Should I have separate requirements files for each of these different dev use-cases? I'm leaning towards this latter approach.
[x] Research tox or act for running github actions locally. I think the main challenge here is spinning up environments that use all the different versions of python I test against the way github actions do. But I wonder if this is what act can do. Looks like act uses docker which sounds like a pain on Windows. Maybe I could run act on a Linux VM or something. But I think I'll likely conclude this is not worth the effort. I can just use github action on github. The worst case is I end up with some throwaway commits in PRs whose only purpose was to test some github action behavior.
[ ] Consider the FormattedValue suggestion to support converting formatted values to a number of output formats including, e.g. string literal, Latex formatted string, html formatted string.

isabelizimm commented 11 months ago

These changes and updates look great! If you would like second eyes on any changes going in, feel free to tag me in PRs. I tend to be best summoned by an @ 😄

Re-development instructions:

Minimum: add a line in installation to show people how to get latest version, eg

pip install git+https://github.com/jagerber48/sciform.git

This will show people how to get latest development changes, even if they're not released yet. (example)

Slightly more effort: some documentation, either in a README or CONTRIBUTING file to show mainly 1) how to get to a local install, 2) how to build docs and 3) how to run tests. One example is the CONTRIBUTING.md guide in the package pins..

How exactly you want to configure those is up to you. My experience is that requirements.txt files tend to get stale, so I generally lean towards sciform[dev] via pyproject.toml route. However, the sciform package itself doesn't have any dependencies, so I think that using requirements files is a perfectly valid move in this scenario if it works better for you! If you want to make it different requirements files, splitting it out to requirements-test/requirements-docs makes sense, but probably no more modular than that.

jagerber48 commented 11 months ago

@isabelizimm Thanks for all that info. Ok, it sounds like your main suggestion is to include instructions for how people can install the latest development/pre-release version of the code to e.g. test it out. That is easy to do. The other part that we've been discussing is to include instructions for people who want to develop sciform. These two things have different requirements.

In any case, I have a PR with both of them I would appreciate you having a look at if you're able to: https://github.com/jagerber48/sciform/pull/90. I think it's relatively straight forward.

I decided to go with the pip install -e .[dev] approach, and also to not break things down any further than this. The numpy/scipy/matplotlib dev requirements really aren't that burdensome, and so few people are going to be developing sciform I think it's just not worth any added overhead.

jagerber48 commented 11 months ago

Unless there are strong opinions, I don't want to setup pre-commit hooks or tox/act to do any sort of CI locally. I'm fine just letting it run in the PR and doing the local steps manually. In the future I may set up pre-commit hooks to run tests and do formatting as I become more comfortable with the tools. The main downside I see to setting up the pre-commit hooks is that it seems like it would be challenging to submit a commit that doesn't perfectly conform to everything. I could see this being annoying if I'm trying to set up some quick and dirty code that might fail some tests, or that I don't want to bother to format perfectly and I want to commit it to keep it around. Maybe there's an easy way to bypass pre-commit hooks for these sorts of situations?

jagerber48 commented 11 months ago

I'm going to rename top_dig_place to left_pad_dec_place. This is consistent with a renaming of the rounding mode that rounds by "decimal place" (similar to "precision" for the python built-in FSML) to dec_place. "Digits place" is not a standard terminology for this concept, "decimal place" is the standard terminology.

Here's an idea on SciNum and SciNumUnc. I think SciNum is an ok name for a number that is going to be converted to a formatted string. I don't like SciNumUnc though. I think I could remove the SciNumUnc object and just let the SciNum __init__ method take an optional uncertainty input parameter and then perform the formatting accordingly.

isabelizimm commented 11 months ago

re: development docs: I reviewed that PR, a few small comments but looks good to me! It does really help users to get started contributing when they don't have to guess about some of the going-though-the-motions pieces of development, so I think this is a great addition. Plus, everything ran on my laptop right out of the box, which is always exciting➕ 😄

re: CI/pre-commit: I personally don't use act/tox/etc. I've tried a few, but no project of mine has been heavyweight enough that I found it a net-win to my workflow. The one thing I do use, though, is pre-commit hooks. It might be nice to start using this tool locally to make sure all new code is compliant, and integrate it into CI as you see fit.

A few pre-commit moves that might make it more useful to you:

You can pass pre-commit --no-verify to bypass these hooks, if needed
You can set up CI for pre-commit, or just have it part of a local workflow (still super useful, imo)
If you check in the pre-commit hooks into a repository, it can be part of the development install/docs, just like how you have ruff currently. Because pre-commit runs, well, pre-commit, people won't be able to make non-compliant commits.

re: naming: These both seem like good moves! The digits place -> decimal place move immediately clicked in my brain, and having uncertainty as a parameter seems reasonable enough to me.

jagerber48 commented 11 months ago

@isabelizimm ok yes, pre-commit --no-verify was the trick to make me comfortable setting up pre-commit and adding pre-commit instructions. Fortunately git kraken (the tool I use for git) has an easy button to do the --no-verify option, so this is all very convenient for me.

Ok I'm working on another release (0.31.0) that addresses all of the remaining concerns in the review except the lazy formatting FormattedValue suggestion. Here's the PR for the release https://github.com/jagerber48/sciform/pull/96. There are a number of notable changes you can find in the changelog, but the highlights are:

Remove SciNumUnc class, now SciNum class accepts optional second input for value/uncertainty formatted
No longer possible to configure separators using FSML, this just made the FSML too complicated. This made it easier to write annotated FSML examples.
Many options were renamed. Notably the unintuitive and ugly unc abbreviation has been eliminated. See changelog for all the changes.
Linted/reformatted code using ruff. This makes the diff huge

Anyways, if any of you are inclined, @Batalex @machow @isabelizimm I would appreciate if any of you have comments on this jump from 0.30.1 to 0.31.0 before I release it. Otherwise, I'm curious about what next steps on the review look like.

There are a couple improvements I see that can be made, but I consider them to be typical package improvements that should happen over time and I don't know that they have bearing on the review.

https://github.com/jagerber48/sciform/issues/73 Resolving this issue would improve the interface
I have a todo item to review the behaviors of the options/functions to add e.g. the "centi-" prefix.
Look at implementing the FormattedValue lazy formatting suggestion
Make Formatter accept SciNum input
Add functionality to print out options from the Formatter object.

Batalex commented 10 months ago

IMO we are at the end of the review process. Thank you kindly for bearing with our demands!

Before I ask @machow and @isabelizimm for their final approval, I would like to ask you if you feel that the API is stable enough so that you can release 1.0 in the same time frame as the pyopensci "seal of quality"?

This is not an obligation by any mean, of course.