Closed jagerber48 closed 10 months ago
Welcome @jagerber48 and thanks so much for your detailed pre-submission inquiry.
After a first read-through, sciform looks like it could be in scope for a pyOpenSci review, but we'd like to ask you for a little more information.
I see above you marked the package as pangeo affiliated -- can you confirm whether that's the case, or was it a mistake? Maybe something about our form wasn't clear there?
The goals for sciform seem clear, but can you please say more about who the target audience is, and what the use cases are? We very much welcome newer packages for review, so it's fine if you feel like you currently the only user, but who do you intend to use it more broadly? Along the same lines: the docs have a lot of great detail about how to use sciform
, but I'm not finding a lot about who would use it, where and when. Should I use it to format papers or reports? Should I use it to make numerical experiments more replicable? I have a feeling you already have some use cases in mind, but maybe you are so focused on development right now that you just haven't written a lot about those yet. Some vignettes in the documentation with walkthroughs of use cases would really help, something like "here's how you'd use sciform to do x". Please let us know more about who you're developing sciform form and when you see them using it.
Hello @NickleDave, thanks for your response!
Yes, the Pangeo affiliation was a misunderstanding on my part. I thought pyopensci had a requirement to be conformant with Pangeo in addition to pyopensci, so I was indicating willingness to conform there as well. I've unchecked that box.
Good question. I will think about this but here are my off-the-cuff thoughts.
sciform
outputs are python print
or logger
functions. Sometimes I make a table using the tabulate package, fill it with formatted strings, and print that to the terminal.I can include the tabulate and matplotlib use cases in the documentation. I think those would be illustrative use cases people could look at.
hey @jagerber48 !! 👋 welcome! I just had a question not related to this specific review. What on that form would make it more clear that pangeo is an option thing? we have an affiliated partner program and that check just allows someone to ALSO become pangeo affiliated. But it's not a requirement. How could we make that more clear as you are not the first person to be confused by that!!
Also i'm wondering then if this tool would really be a support tool for reproducible reports (which is important to our open science goals)? If it's really about printing and output. Does that type of application (reproducible reports/ jupyter notebook output, etc). resonate with your goals for the tool?
@lwasser Thank you for the welcome and your questions/comments!
About the Pangeo option from my perspective: Something like "You may optionally choose to affiliate your package with additional communities by checking the boxes below. These affiliations may come with XYZ benefits/additional requirements" Even just an "(optional)" flag may have cleared me. "If your package fits into an existing community please check below:" is a challenging sentence be cause I don't know what these communities are and I didn't want to learn it at the time. So yeah, replacing this with something like "If you would like to affiliate your package with an existing community, please check below" would have helped me I think.
Also i'm wondering then if this tool would really be a support tool for reproducible reports (which is important to our open science goals)? If it's really about printing and output. Does that type of application (reproducible reports/ jupyter notebook output, etc). resonate with your goals for the tool?
"would really be a support tool for reproducible reports". What are "reproducible reports"? The tool takes python floats or float pairs and converts them to formatted (hopefully human readable) strings. There are many ways these strings could be used, it sounds like "reproducible reports" is definitely a use case that this tool could support. You mention Jupyter notebook output, that's definitely something I use it for, so I would say this does resonate with my goals for the tool.
@NickleDave
I've updated the documentation to include my prototypical use case: https://sciform.readthedocs.io/en/stable/examples.html. Here I am doing two visualization tasks. I have x, y data which I am fitting an extracting best fit parameters for. The first visualization task is plotting the data. The second visualization task is displaying the best fit parameters (and their uncertainties from the fit routine) in a table.
sciform
helps with the first plotting task by making it relatively straight forward (though with some admittedly not 100% straightforward helper functions) to convert the tick labels into SI prefix format.
sciform
helps with the second table task by making it easy to format value/uncertainty pairs together for easy reading and order of magnitude comparison.
I imagine sciform
will typically be used in python scripts or notebooks after some data analysis has been done, and now the user want to print analysis results to the terminal or notebook output. However, the result could also be saved into some sort of human-readable, text-based report which lives in memory or which is saved to the disk.
Instead of using sciform
immediately at the conclusion of analysis, users could also use sciform
while traversing a non-human readable data file to generate a rounded, human-readable version or summary of that data file. For example if the data file contains numeric or value/uncertainty type data.
I imagine adding an option to format strings into a "pretty" format using unicode characters and also a "latex" format similar to the uncertainties
and other float formatting packages I linked above. Especially the "latex" format will open up more use cases for plotting (matplotlib requires latex for some formatting tasks) and report generation.
@NickleDave I'm curious what next steps are for this. It seems like the package is likely in scope for pyopensci. Does that mean the next step is to actually submit the package and work towards meeting those requirements?
Hi @jagerber48 thank you for your patience--we wanted to get input from other community members about whether this package was in scope.
Thank you also for updating the documentation with a use case. That is exactly the kind of concrete example that really helps users understand what you are trying to do for them.
We have decided that, yes, we will proceed with a review.
Please go ahead and make a full submission. Be sure to mention this issue by number when you do so ("as discussed in #114") and please be sure to complete the pre-review survey when you do make the submission. Appreciate it!
Once you have opened that issue referencing this one, I will close this. We will then put out a call for an editor and reviewers.
@NickleDave ok great, thank you for your response! I'll be going on a two week vacation starting this weekend and I haven't yet had time to make the full submission yet. I will work on it, as per all your instructions, when I return.
Thanks for letting me know @jagerber48 -- no rush. Have a good vacation!
@NickleDave I've made the full submission at https://github.com/pyOpenSci/software-submission/issues/121.
One question before next steps: I have a few high level and lower design questions about the package. Some are about the overall architecture of the code and some are about "should I include this feature or this requirement". I'm curious if these types of questions are in-scope for the code review. Or if the code review should be thought of as reviewing the quality of the code and giving general advice based on the code at one snapshot in time (at one version number). I may as well mention some specific questions I have here and then you can better inform me about their appropriateness for discussion. These are the questions I have that I'm not sure are in scope for review. I also have some questions that I'm more sure are in scope for review (like should I add more unit tests, how can I improve continuous integration).
FormatOptions
object. However, these options need to be repeated in full many times throughout the code in function signatures and bodies for a few reasons. What can be done to mitigate this repetition? Specifically, this repetition means that a lot of (somewhat error-prone) work needs to be done if I ever want to add new options.sfloat
(sciform formattable float objects) or (2) formatting lists of numbers? Should sciform
get into more involved formatting involving units?
sciform
functions over sequences or arrays is using np.vectorize
. But it is worth making sciform
depend on numpy
for this?sfloat
class is supported should an sDecimal
class be supported?"+/-"
. But given unicode is prolific now, should the default be "±"
?Hi @jagerber48, happy to help.
These are all good questions to ask yourself as a developer, and I have definitely found myself pondering similar questions before.
However, I can't give you a detailed answer here, because I would feel like I'm starting to review.
In fact, some of these questions start to be about scope, and ideally we should not run a review just for the purpose of figuring out scope. That's something that should be determined ahead of time.
We do want to help you though.
Let's do the following in this case:
A related practice that I find helpful is to keep a "dev diary". I write down questions like this each day I do dev work, and I also prioritize my to-dos. If the same questions or ideas keep popping up, then it helps me know that I really need to prioritize working on them. I also include links to other code, papers, etc., that give me concrete examples--if I can't find anyone else who is doing what I have in mind, then that tells me something.
Hope that's somewhat helpful--I'm only telling you because I wish I had gotten into this practice much sooner, along with using project management tools like GitHub Projects.
Please ask these questions on our forum and let's take it from there. Let's time box that process--say, two weeks max--and then we'll start the review.
@NickleDave thank you very much for the response, that is the sort of stuff I was looking for and is very helpful! The dev diary would definitely be helpful for me and I will look into GitHub projects. thank you for these pieces of advice.
I asked my question about the formatting options proliferation here. That is one spot I hope to improve the code. Perhaps this specific question about code organization/repetition is actually in scope for the review process?
After typing out but not posting a new topic on the scope questions (especially the list and arithmetic features) I've decided to take the following approach. I'll start out with the most conservative approach. So the package will be strictly for formatting individual numbers or pairs of numbers with a lot of possible formatting options. No arithmetic, no sequence/array handling and no numpy
dependency. The inclusion or exclusion of these features doesn't change the core functionality of the package and I can structure (and have structured) the code so that these can be added a additional features at any time. So I'll go forward with a review without these features for now. Regarding the sfloat
/sDecimal
question: right now I just have one class SciNum
that doesn't provide arithmetic, it just stores a single number and can format it. Only if I want to support arithmetic in the future will I need to re-address this question.
The "±"
question still stands but is very minor and also doesn't block review. However, it may block releasing version 1.0.0, but I think I can discuss that separately independent of the review.
That's perfect, thank you @jagerber48.
The question on the forum is very well stated and I think you will get good feedback.
I think you are exactly right to take a more conservative approach for now. One thing I see happen is that developers get excited about adding new features and solving the related programming problems. There's nothing wrong with that, of course. (It's one of the reasons we like doing this stuff!) But it can take time away from "road-testing" the existing functionality out in the real world. My sense is that you'll get more out of focusing on that for now.
Perhaps this specific question about code organization/repetition is actually in scope for the review process?
Yes. Let's do the following:
I'm going to close this presubmission issue since we have the submission open. Let's continue discussion there
Submitting Author: Justin Gerber (@jagerber48)
Package Name: sciform One-Line Description of Package: Provides extended functionality for formatting floats into strings according to scientific standards Repository Link (if existing): https://github.com/jagerber48/sciform
Code of Conduct & Commitment to Maintain Package
Description
Community Partnerships
We partner with communities to support peer review with an additional layer of checks that satisfy community requirements. If your package fits into an existing community please check below:
Scope
Scope
Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
There are no existing community partnerships for this project, though there may be opportunities for education around significant figures and uncertainty.
Who is the target audience and what are the scientific applications of this package?
Any scientist who uses python is in the potential target audience for this package, but especially those who are concerned with displaying data values in a way that is commensurate with the corresponding uncertainties. Most scientists likely use the python built-in string formatting for this purpose, but there are some shortcomings to python built-in formatting. Scientists who seek more formatting features could consider
sciform
.Are there other Python packages that accomplish similar things? If so, how does yours differ? Yes there are similar packages.
sciform
includes its own string formatting mini language closely based on the built in one, but with some differences. Notablysciform
includes well-controlled significant figure formatting, engineering notation, binary formatting, SI/IEC prefix substitution, digit grouping and decimal symbol options (helpful for a diversity of locales), exponent value coercion, as well as value +/- uncertainty formatting functionality.sciform
was heavily motivated by this package. This package has sophisticated statistical handling of value +/- uncertainty pairs, handling error propagation and simulation under-the-hood. In addition, it has its own extension of the mini language for formatting value +/- uncertainty pairs.sciform
has more formatting functionality than the uncertainties package including, especially, engineering notation, grouping separator controls, and prefix substitution.sciform
is also a much lighter weight requirement than the uncertainties package. This may be desirable when a user wants to format strings, but they don't need the rest of the full statistical machinery of theuncertainties
package.sciform
was also motivated by theprefixed
package. This package provides a sort of engineering notation where exponents are rounded to multiples of 3, and then exponents area always replaced with their corresponding SI exponent.prefixed
package is a more conservative extension of the built-in formatting language.sciform
includes more functionality including engineering notation without prefix substitution and more grouping/decimal symbol control.sciform
also includes global configuration options for handling optional SI prefixes such asc
,d
,da
, andh
.sigfig
package has similar functionality tosciform
including sig fig rounding, separator control, value +/- uncertainty formatting including some features that are only forthcoming insciform
. sig fig does not currently support binary formatting. sig fig also does not provide a format specification mini language for formatting floats. Rather floats are formatted using an overload of the built-inround
function which I find to be slightly awkward compared to aFormatter
object or function.Any other questions or issues we should be aware of: Much of the code is still a work in progress. I'm still working on documenting the existing features, more unit tests are necessary for existing features, and the value +/- uncertainty features are still young and not thoroughly tested. I have important ideas in mind for more value +/- uncertainty formatting features. But I would say the core of the package is in place. One glaring gap for this package is support for
Decimal
number rather thatfloat
numbers. I would like to add that functionality after the functionality for formatting floats is stable.This package is very new and has 1 user so far. Me. But, I've been kicking around code for this sort of formatting for quite some time now and think many others would find it useful. Having a small authoritative package for this sort of formatting could be useful for the scientific community. There is also some interest in getting some of these features into the python built in string formatting feature set which would be very useful. Having a package like this could be a stepping stone towards that. See https://discuss.python.org/t/new-format-specifiers-for-string-formatting-of-floats-with-si-and-iec-prefixes/26914/46. Though I do note that the format specification mini language is intentionally not 100% backwards compatible with the built in format specification mini language, so it would not be a top candidate for that role.
I'm also not very experienced when it comes to contributing to open source software. This is one of my first forays into that world, so I am learning as I go.
P.S. Have feedback/comments about our review process? Leave a comment here