lmfit / uncertainties

Transparent calculations with uncertainties on the quantities involved (aka "error propagation"); calculation of derivatives.
http://uncertainties.readthedocs.io/
Other
575 stars 74 forks source link

package maintanence #180

Open newville opened 8 months ago

newville commented 8 months ago

@lebigot You have called for a maintainer and offered to help with the transition to a new maintainer.

I have multiple projects that depend on this package and would like to make sure this project is maintained. By age alone, I might be closer to retirement than you ;), but I might be willing to fork this to the lmfit Github organization (which can have multiple projects and members), and maintain this package from there. I would add you as a member of that if you are willing.

If that is acceptable to you, we could then figure out to push to PyPI, and conda-based "feedstocks" and packages.

wshanks commented 8 months ago

I also offered to help with maintenance and @lebigot gave me commit access but I haven't been able to make any maintenance updates yet. I mainly work with uncertainties through working on https://github.com/Qiskit-Extensions/qiskit-experiments (which also uses lmfit) but you have a bigger stake in uncertainties than I do @newville. I had just been planning some simple clean up and leaving publishing new releases to @lebigot until he told me otherwise. I am happy with any hierarchy of maintainership @lebigot prefers. I don't want to slow anything down.

newville commented 8 months ago

@wshanks if you or anyone else is willing, I would happily add you to the lmfit organization to participate in any projects there.

newville commented 8 months ago

@lebigot @wshanks I admit that it has been a while since I looked at the code, existing Issues, and PR for this package. I think that there are a few different things that could be improved:

  1. drop for Python <= 3.7.
  2. migrate to more modern packaging, testing with pyest, and CI with Github Actions.
  3. work on merging existing PRs and trying to address some of the existing issues.
  4. build on #47 and references to Issues and PRs therein to give a modern interface for numpy and pandas.

Now that I see these and the explicit call for a maintainer from @lebigot, I think these all need to be done.

A crude analysis shows 216K downloads/month for this project (https://www.pepy.tech/projects/uncertainties) and 177K downloads/month for lmfit-py (https://www.pepy.tech/projects/lmfit), which has uncertainties as a required dependency. I interpret that to mean that lmfit is almost certainly the largest customer for this code, and that probably more than 1/2 of uncertainties users are using it only through lmfit.

I propose that this project move to the lmfit organization and work be done on it at https://github.com/lmfit/uncertainties. I have invited @lebigot to join that organization and would be happy to add anyone interested. Alternatively, we could rename the fork at lmfit/uncertainties to some other package name (no sure what) and have lmfit-py use that renamed package. I think keeping the name uncertainties and keeping the clarity of @lebigot as the original author is vastly preferable. I hope that @lebigot will agree. But I also think that we sort of need to be able to move forward and address some of these outstanding issues.

Does anyone have any preferences for how this is done, or interest in participating?

wshanks commented 8 months ago

@newville I am interested in participating. The things you listed were the simple clean up I had in mind (updating deprecated code including dropping support for old versions of Python, replacing the deprecated nose tests with pytest, replacing the out of fashion Travis CI with GitHub, moving from setup.py to pyproject.toml, etc.).

I am also interested in the array interface work but others might be more motivated to do that then I am. I am mainly interested from a performance perspective which I think is a more involved project (operating on arrays of values in a way that could be compatible with jax or tensorflow).

I don't have a strong opinion about how the project is organized. I will let you and @lebigot decide that.

newville commented 8 months ago

@wshanks Thanks - I agree with all of that! I suspect that a smoother array interface would benefit many folks who aren't making use of this library but probably should. Like, if calculations on pandas Datasets could propagate uncertainties properly, that could be very useful.

andrewgsavage commented 7 months ago

I started writing an UArray class, getting it to the point where it could perform sin:

arr = np.array([ufloat(1, 0.01), ufloat(2, 0.1)])
uarr = UArray(arr)
np.sin(uarr)
<UArray [0.8414709848078965+/-0.005403023058681398
 0.9092974268256817+/-0.04161468365471424] >

However when I operate on a UFloat to get something to compare agaisnt to test the UArray, I can't use numpy:

[np.sin(x) for x in arr]

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
AttributeError: 'Variable' object has no attribute 'sin'

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
Cell In[30], line 1
----> 1 [np.sin(x) for x in arr]

Cell In[30], line 1, in <listcomp>(.0)
----> 1 [np.sin(x) for x in arr]

TypeError: loop of ufunc does not support argument 0 of type Variable which has no callable sin method

This makes it difficult to write tests; having to use umath or unumpy is clunky and makes the module difficult to use.

I'd suggest adding functions to Variable so umath is not needed and writing pytest tests as you go. Then the create a UArray to remove the need for unumpy, again writing tests as you go.

newville commented 7 months ago

@andrewgsavage Hm, yeah I think I see what you mean - and maybe this needs a deeper re-design for both a float-like object and then ndarray of floats built on that (that I think might also make it easier for pint, pandas, xarray, ...). That's a more radical change than I was expecting, but I think it might be for the best.

In the context of this PR about "maintenance", it seems that we're seeing a fair number of issues and PRs that are not getting much attention, and many of those are either "very old Python2-like" issues, or related to better wrappings for other numerical objects (pint, pandas, xarray, ...)

I would prefer to keep the history, precedence, credit, and name of "uncertainties". I do not quite see that progressing very quickly. At some point, it might be better to start afresh, taking what we can from this project, but leaving behind outdated legacy. I want to say that we are not at that point yet, but I am becoming afraid that we might actually be there.

Do you have an opinion on that?

andrewgsavage commented 7 months ago

I think it may be doable without a deep re-design by impelementing numpy functions for Variable.

I agree, the old Python2 like issues and codebase hinder further development of the module.

It would be great to keep the name and history of uncertainties. The lack of response to your earlier question to @lebigot, the review in your PR asking for python2 compability, and the other long standing PRs that without response makes feel like it will be difficult and time consuming to make progress. Your lmfit/uncertainties fork seems a good way to move forward.

wshanks commented 7 months ago

@lebigot added me as a collaborator so I could merge some of the PRs here, but that happened shortly before this issue was opened. So I have held off doing any maintenance here until there is more consensus of how the project will be maintained (or forked and maintained) going forward. I can't change repo settings or publish to PyPI, so we still need more agreement with @lebigot (or to start a new project if there is not agreement).

I think the main question is whether to keep the project here or move it under the lmfit GitHub organization? Given that @lebigot added me, I expect he would add you two as collaborators as well, but you may want more of stake in shared maintainership than that.

newville commented 7 months ago

@wshanks @andrewgsavage Thanks. I think that it would be much better to move to a GitHub "organization", where there can be multiple people with ownership and management roles. The Lmfit organization seems like a reasonable choice (as above, I believe lmfit may be the largest "user" of the uncertainties), but it would be OK to use another one.

As for retaining the name "uncertainties", I think the main issue is whether or how new maintainers could push updates to PyPI.com. I think there is already a reasonable mechanism for changing who pushes to conda-forge feedstocks.

I guess that we could start thinking about a new potential name for a "friendly fork" in case that is not possible. Again, I think that would not be anyone's first choice, but it seems like @lebigot may not have time or inclination to make any merges or to be able to make that move happen.

I suggest that the three of us (@wshanks, @andrewgsavage, @newville but we could certainly add others who are instrumentation) decide on a time frame for deciding whether we need a new project with a new name. I suggest "March 1st", but am I not in a rush or committed to that timeline. Any other thoughts?

andrewgsavage commented 7 months ago

I would also rather move to lmfit for the reasons @newville gave. Could we start making changes to that fork?

I'm not fussed as to what owership/management/stake I have - happy to just contribute or to review PRs and such.

March 1st sounds reasonable

newville commented 7 months ago

@andrewgsavage Yes, we can start making changes at lmfit/uncertainties.

I see a few issues:

  1. to be able to push releasees to PyPI, Either @lebigot would have to add someone here as a PyPI "maintainer" or "owner" of uncertainties, or we would have to go through the process at https://peps.python.org/pep-0541/#how-to-request-a-name-transfer

  2. right now Github describes lmfit/uncertainties as a fork of lebigot/uncertainties. To make lmfit/uncertainties be the "origin", @lebigot would have to transfer the repository to lmfit. He is enabled to do that, but we would first have to rename the current lmfit/uncertainties first (we could reproduce any changes made in the interim). The existing PR (#182) would be closed. If we're moving on anyway, that is not a big deal.

It would be preferable to have a smooth transition. I think that means that @lebigot would have to move the package to the lmfit organization and add maintainers to PyPI.

I hope he is willing and able to do this, but it does not seem clear if this will happen. If not, we could just fork and work from there. We could either request a PyPI transfer or pick another name for the package.

jagerber48 commented 7 months ago

I have used uncertainties for some time and looked under the hood a little bit. I would be interested in helping out with some cleanup of uncertainties. Possibly contributing and helping out with PRs. Don't know if I would have the time for, or be qualified for a larger role!

I guess there are two issues being discussed in this issue. One is how maintenance of uncertainties should be passed on and the second is what maintenance should happen. I agree with all the of the maintenance suggestions but I wonder if there should be a separate issue either in lebigot/uncertainties now or whatever home uncertainties finds in the next few months where there can be more targeted discussions about what exactly maintenance should be done.

That said, the two above issues are a little bit intermingled if the idea of changing the name and releasing a heavily refactored or new package is on the table.

I hope he is willing and able to do this, but it does not seem clear if this will happen. If not, we could just fork and work from there. We could either request a PyPI transfer or pick another name for the package.

Requesting PyPi transfer seems possible and preferable. In the event he can't manage even those transfers it would be a shame for the "official" package to be the fork, but I guess so be it? Even if the code changes are massive it seems like the uncertainties name should be retained. I imagine the functionality, usages, and interface would remain largely in-tact even if there are major changes under the hood. I wonder if github supports any sort of ownership transfer request similar to pypi in the event of inactivity from the repo owner? Of course, I think the likelihood any of this will need to be resorted to is low, from the call for maintainers and previous history I assume @lebigot is not totally absent at this point, I'm guessing he just rarely has time to look at uncertainties, but when he gets time he'll be able to help resolve a path forward.


I'm going to throw in a little bit of a personal plug here: One of the main uses I've had for uncertainties is to get clean formatting of value/uncertainty pairs for printing to terminals, tables, plots etc. However, while the formatting available is very nice and feature-full, I came across a couple of short comings including no support for engineering notation and poor handling of edge cases involving non-finite values/uncertainties. Inspired by uncertainties and some other packages for formatting numbers, I wrote sciform. I think sciform provides a superset of the formatting features that uncertainties does, though it does so with a different format specification mini-language (FSML) that is no longer a strict extension of the built in FSML. In any case, it had been on my radar to discuss if sciform could be of use to uncertainties. Either taking over the formatting entirely (probably would be backwards incompatible, though I'd be open to make changes to make sciform backwards compatible with uncertainties) or at least acting as a dedicated formatting backend rather than uncertainties itself having its own formatting code in addition to the core error propagation functionality (the former of which I see as a nice add on, the latter of which I see as the truly core functionality and value provided by uncertainties). In any case, if major work/rework of uncertainties is being planned along with at least one major version bump, I wanted to bring this up to see what others think.

newville commented 7 months ago

@jagerber48 thanks! I think it will be very good to have more interested parties involved. Yes, I agree that there are two separate topics:

  1. how should maintenance be handled? I think we mostly agree that getting help from Eric to migrate the project to lmfit/uncertainties would be preferred - using an organization that can have multiple owners and maintainers. We are waiting and hoping that this can happen by March 1.
  2. what are the maintenance and perhaps even development priorities?

I think the is sort of an immediate-ish need to support Python 3.12 and probably drop support for Python 2 (and no need to support Python < 3.7 or so), and then there are several ideas for feature improvements. There are several open PRs, some of them related and overlapping. I think an incomplete list (in no particular order) would be

I would be happy to see that list expanded (if I missed a favorite PR, I apologize, and let's add it!). I think we could start discussing priorities and order of operations for those. Just to be clear: I am willing to be engaged and help, but do not feel in any way that I should be "in charge". If anyone is willing to take the lead here, I would be happy to be a contributor and helper.

On sciform: nice! I can see myself using that.
In fact, over at lmfit, we have a function called gformat (https://github.com/lmfit/lmfit-py/blob/master/lmfit/printfuncs.py#L33) that emulates a "g" formatting, but guarantees to obey the user-selected (or default) length of the output string and works hard to maximize the number of significant digits reported. This is very useful when making pure-ASCII reports, (e.g., https://lmfit.github.io/lmfit-py/examples/example_fit_with_bounds.html). I sometimes use this in other projects and have wondered if this should be in some other package.

If you think it might be appropriate for sciform, I would be happy to have it moved there and then use it from that package.

andrewgsavage commented 7 months ago

perhaps (maybe my own opinion) flip UFloat as an alias for AffineScalarFunc to the other way around.

I put my thoughts on this in #185

perhaps (maybe my own opinion) move much of the "very long docstrings" to RST documentation and keep docstrings shorter and about "how to run".

Yes this would be nicer. The information in the docstrings is really useful, just a little wordy.

I'd prioritise issues like so, with no particular order in each bucket:

Highest bucket

  1. Drop support for python 2.7, get a CI going green with a python 3.x version. (will also need to migrate to pytest here if using >py3.5. Other than the test1to2 (which would be deleted here) I think the tests files work with pytest, so it's not too big a task to migrate)

Next highest:

2a. Remove python2 specific code/comments 2b. modernize installation tools, pyproject.toml, etc. 2c. move test scripts to a directory outside of the source tree 2d. reorganising core into smaller files / subclassing #188 2e. (if not already) Migrate to pytest 2f. Test matrix with different versions (different pythons, with/without/different versions of numpy)

Then improvements:

3a. better Numpy support (for pandas, xarray etc) 3b. printing/formatting improvements 3c. docstrings 3d. Renaming objects eg Ufloat 3e. pandas support

That's my ideal ordering to make easier for us to review PRs and to reduce files moving around, but any PRs are appericated, it shouldn't stop anyone working on what they want. As are suggestions to add to the list or reordering if I've missed anything.

andrewgsavage commented 7 months ago

might be worth noting that hgrecco is in the process of rewriting pint's formatting section and asking for feedback here https://github.com/hgrecco/pint/issues/1913#issuecomment-1899830982

newville commented 7 months ago

@andrewgsavage Thanks! I agree with all of that. And, yes it would be nice to agree with the pint folks or formatting. I think that a separate project like sciform that we all agree to use is probably the right approach.

wshanks commented 6 months ago

We could either request a PyPI transfer or pick another name for the package.

Ideally, @lebigot will give his opinion on this. He added the call for maintainers to the README.md two months ago and has responded to an issue just over a month ago in https://github.com/lebigot/uncertainties/issues/181#issuecomment-1860157391, so I wouldn't say it would be in the spirit of PEP541 to classify the project as abandoned and transfer the name without hearing from him.

newville commented 6 months ago

@wshanks I agree. It would be much better to transfer the main repo and the ability to upload releases to PyPI with help from @lebigot. It would also be great to continue to get his insight on developments and maintenance.

I hope we can make the transition as easy as possible.

jagerber48 commented 6 months ago

Two questions:

(1) I just want to clarify if we can begin working on the forked version of this repo in the lmfit organization. Should we open issues and PRs there? Will changes there be able to merge into a branch in the main uncertainties repo if that's the direction we end up going? Or should we just be on hold until the March deadline? (2) What about linting/formatting? I use ruff as a linter/formatter. Unless told otherwise I would probably use it locally. I think it would be especially helpful for "modernizing" uncertainties. If we agree on a linter/formatter then we could build checks into the CI.

newville commented 6 months ago

@jagerber48

(1) I just want to clarify if we can begin working on the forked version of this repo in the lmfit organization. Should we open issues and PRs there?

I think so.

Will changes there be able to merge into a branch in the main uncertainties repo if that's the direction we end up going? Or should we just be on hold until the March deadline?

Sure. We're hoping that the current repo gets migrated, but then we'll be able to merge those.

Many of the outstanding PRs probably need some attention and maybe some need discussion.

(2) What about linting/formatting?

I think this repo currently has neither. It would probably need discussion, perhaps using something like pre-commit But I would recommend focusing attention on getting code working for supported versions of Python

I use ruff as a linter/formatter. Unless told otherwise I would probably use it locally. I think it would be especially helpful for "modernizing" uncertainties. If we agree on a linter/formatter then we could build checks into the CI.

All I know about ruff is that it boasts about being a fast linter. Do we care about linter speed? If so, why? To me, this suggests that ruff might prioritize other things that I would also personally not care about. But, I don't know anything else about it. How would ruff "modernize" the code?

For linter/formatter, I would probably prefer to rely on PEP-8 and common sense. My bias is that discussions about formatting Python code that are beyond "we follow PEP-8, including but have the good sense to allow bending it on occasion" seem to come from people looking to pick a fight where none is needed. ;).

jagerber48 commented 6 months ago

Many of the outstanding PRs probably need some attention and maybe some need discussion.

I'll familiarize myself more with those. I was eyeing the list andrewsavage posted above with item 1 being getting CI and tests going green.

All I know about ruff is that it boasts about being a fast linter.

Yes, ruff boasts about being a fast linter but I don't care about its speed. I like that it supports the same linting/formatting as black but allows you to enable/disable rules as the team wishes (it is less opiniated than black). It would help modernize code just because the linter can quickly flag anything non-PEP-8 compliant (and also other not compliant with other linter rules). It can also automatically format the code to resolve many issues. You can also pass into the linter what versions you want to support, so very quickly it would flag all lines that have legacy stuff remaining in support of e.g. Python < 3.x.

Obviously there are other ways to do this, and we don't need consensus for e.g. me to use the linter/formatter locally and then push the results to the PR (the PR is the step that needs approval, doesn't matter how the code got there.)

For linter/formatter, I would probably prefer to rely on PEP-8 and common sense.

That's reasonable. I just wanted to bring up the linter/formatter question early because if there is linting/formatter that we would prefer to use I think it would be easier to implement it earlier before a large bought of code changes rather than later.

To be clear I would personally prefer uncertainties to have a linter/formatter, but I'm not pushing hard for it. Just bringing up the possibility.


What versions of python should be supported? I'll just throw into the ring the idea of support Python 3.8+ since older versions are at python end-of-life. That said, I wouldn't see a need for dropping support for older python versions once they reach end-of-life. That is, 3.8 would be support indefinitely until something comes up that gives pressure to drop it. (Like Python 3 gives pressure to drop Python 2 after many years)

I very much do not like non-f-string formatting so I would be pretty frustrated to support anything < Python 3.6.

I know there are a lot of typing changes through 3.8 through 3.10 or 3.11 at least. Otherwise I'm not personally aware of any important python features that would encourage dropping support for any specific versions.

Again, not trying to advocate hard for any specific support (other than >=3.6...), just trying to put some ideas out there.

andrewgsavage commented 6 months ago

I find the linter/formatter helpful for me as it means I don't need to think about formatting, the CI will do it for me when I run pre-commit.

There are some code patterns in uncertainties that I haven't seen in other libraries which I think would be picked up by a linter, which would help in modernizing the code. eg from .core import *

Having PEP-8 and other rules (we don't need to use everything in ruff) validated by a CI means reviewers don't need to spend time on PEP-8/formatting etc.

I too would like uncertainties to have a linter. I'll leave it up to whoever sets it up to decide which :p

That's reasonable. I just wanted to bring up the linter/formatter question early because if there is linting/formatter that we would prefer to use I think it would be easier to implement it earlier before a large bought of code changes rather than later.

I thought it would be easier to get it running when there's no/few open PRs, as you end up with a load of merge conflicts in them.

me to use the linter/formatter locally and then push the results to the PR

I think this is a reason to set one up as a CI - if you do this you'll end up with formatting changes unrelated to the PR in some files, which makes reviewing slightly more difficult.

What versions of python should be supported?

The numpy __array_function__ is enabled by default in numpy 1.17 onwards, which supports py3.5, so anything above that is fine with me.

If someone in the future wants to set up typing then that would be a reason to go to a newer version, but I'm happy to leave that for the time being.

andrewgsavage commented 6 months ago

Will changes there be able to merge into a branch in the main uncertainties repo if that's the direction we end up going? Or should we just be on hold until the March deadline?

I figure there's no harm in submitting changes here for review now. Then they can be ready to merge in March.

newville commented 6 months ago

@jagerber48 @andrewgsavage Thanks -- I agree with all of that.

Python 3.8+ or even Python 3.9+ would be acceptable to me. Specifying a version of Numpy (maybe even later than 1.17) would be OK with me.

Standardizing on f-strings for most things would be fine with me, though I'll also say that sometimes string.format() might be preferred.

In general, my basic view is that while linters and formatting conventions are useful, valid Python is already very readable, and forbidding valid Python should be done sparingly. That said, setting up automated use of a linter, with pre-commit scripts would be fine with me.

I would vote for "runs on Python 3.8 to 3.12" to be the first priority.

jagerber48 commented 6 months ago

I thought it would be easier to get it running when there's no/few open PRs, as you end up with a load of merge conflicts in them.

This is a very good point. Better to clean up the backlog of important PRs then do a major reformatting. I wouldn't want the linter/formatter to slow down progress on more important functionality.

forbidding valid Python should be done sparingly.

fair point. In general the inclusion of automated linting/formatting will add overhead in some places (setting it up at first, possibly new contributors familiarizing themselves with how it works) even though it removes it in other places (contributors don't have to worry about little details like e.g. hanging indents). We can see how it looks once we get through the more important issues.

Standardizing on f-strings for most things would be fine with me, though I'll also say that sometimes string.format() might be preferred.

Yes, especially when working with custom formatting like that provided by uncertainties. I think it's mostly the odd/historic %s, %d etc. formatting I would like to avoid.

I would vote for "runs on Python 3.8 to 3.12" to be the first priority.

Seems like a good place to code against at first, and then the cost of supporting lower versions (3.7, 3.6, maybe 3.5) can be assessed.

jagerber48 commented 6 months ago

If someone in the future wants to set up typing then that would be a reason to go to a newer version, but I'm happy to leave that for the time being.

I'm interested in setting up typing, at the very least for the public API but again, happy to leave this very low on the priority list for now, including below supporting 3.8-3.12.

newville commented 5 months ago

@lebigot March 1st is coming up at the end of next week. We had (somewhat arbitrarily but I think not unreasonably) set this as a target deadline for a transition from this repo to one within the lmfit organization.

I think everyone interested here would prefer that the name and the full history of the uncertainties repo be preserved and that pushing releases to PyPI could be done from the new organization. Ideally, you would transfer your repo here to the lmfit organization. I think you may also need to give some of us the information for pushing uncertainties releases to PyPI.

Is this feasible for you to do by March 1st? If not, can you provide some guidance on your wishes for this project? For example, is retaining the name "uncertainties" important?

Let us know if there is anything we can do to help. Thanks, and hope you are doing well.

jagerber48 commented 5 months ago

I think there are ways by contact pypi + github support to have ownership of "abandoned" projects migrated to new owners. However, this project isn't abandoned by @lebigot, he has been occasionally responding to issues etc. over the past months, so there shouldn't be a request for transfer along those lines. He just simply hasn't addressed this thread.

I don't think it makes sense to make a major fork of this package while the original fork is not abandoned. Obviously this stalls work until @lebigot can respond. So two questions:

(1) Maybe we can pursue reaching out to him via an avenue different than this issue (2) How pressing is it for anyone to get upgrades to uncertainties out? E.g. are compatibility issues blocking lmfit in some way @newville? That might change my thoughts on a patient vs. pushed timeline. For me personally there is nothing time urgent. I just think uncertainties is a very important package and we see improvements that can be made.

newville commented 5 months ago

@jagerber48 Thanks. I don't disagree with any of that sentiment. I do not have another contact for @lebigot. I am not sure what to make of the single-line commit made a few weeks ago and the lack of response to this or other issues. I do hope that all is well.

From the lmfit perspective, I think it would be perfectly reasonable to conclude that this package is no longer maintained. We are sad that it is not being maintained, and we still need the functionality. If it cannot be maintained, then it would be reasonable for lmfit to fork, rename, maintain, and use an alternate package that very closely matches the functionality of uncertainties. Whether this becomes the de facto replacement for uncertainties is sort of a secondary consideration. But I also think there is interest from others in this functionality, and we would be happy to help with that, either within the lmfit Github organization or somewhere else.

I think it also reasonable to say that we did reach out, made multiple offers to help, and have been patient in responding to @lebigot's request for a maintainer.

We can hold out hope that @lebigot will transfer the project (and I am most definitely not in a rush to "take over" this code!). But I think it is time to start considering that a smooth transfer may not happen and that we should start thinking of alternate names for this code in case that does not happen. Some ideas:

Any other suggestions?

I am in no rush, and am fine with delaying doing this: we said March 1st, but March 15 or April 1 are not unreasonable. Any thoughts on how much longer to wait?

jagerber48 commented 5 months ago

@newville maybe you can reach out via your other contact, and @wshanks, maybe you could reach out by whatever contact you used to get the commit privileges? Both of these to point him to this thread and try to get a response? Maybe we give two weeks after those other contact messages and if we hear nothing we move forward with the fork?? We could work on the fork and then if we do hear from Lebigot in the future we could probably somehow merge the changes back int uncertainties, even if it requires mangling some git history. Worst case scenario would be we do a lot of dev on a fork with a different name, then Lebigot DOES respond and he doesn't like what has been done and doesn't want it merged back into uncertainties in any way. Then all we've done is create a modern competitor to uncertainties.

I also wonder if a maintenance overhaul at the scale of what is being discussed here is beyond what Lebigot had in mind. Maybe he was hoping to just give a few people commit privileges but he would continue to e.g. output releases?

What exactly do commit privileges mean? Does that mean you can merge PRs into the master branch? But maybe not able to make releases?

If all those contacts fail, and we don't want to take the do nothing route I guess the only option is new names.

I don't really like uncertainties2 or uncertainties4. It feels like asking for trouble down the road to bake the historical version number into the package name. ufloat or vloat kind of preclude the package from supporting Decimal backend for more precise calculations if users every require that (precision metrology experiments push 15-16 sig figs which is at the limit of what double floats can handle, and those types of experiments care more about uncertainty than anyone). That said, of the above, I think uncertainties2 or ufloat would have my vote.

Some other ideas:

I'm a bit fond of errorprop. The current description for uncertainties is

Transparent calculations with uncertainties on the quantities involved (aka "error propagation"); calculation of derivatives.

randvar is nice, but it's a bit mathematically abstract for scientists who may be pretty experimentally focused.

jagerber48 commented 5 months ago

I am in no rush, and am fine with delaying doing this: we said March 1st, but March 15 or April 1 are not unreasonable. Any thoughts on how much longer to wait?

I didn't answer this above. I guess if we reach out to Lebigot via those other contact methods in the next week we could give him until March 15 to respond? I'm also in no rush.

newville commented 5 months ago

@jagerber48 I do not have a contact for @lebigot outside of GitHub. He gives Twitter and LinkedIn links on his Github page, but I do not use either. I have an old (10-year-old) email address that no longer works. He asked for maintainers in November through the README on the main GitHub page, and I would suggest doing this in the open. Maybe private conversations are happening -- I don't know.

The license is (effectively) BSD/MIT and we can fork it with further input from @lebigot. We have proposed a migration concept and are trying to be polite, respectful, and cooperative. If @lebigot wants to do something else or have other people involved (or even decide who is "in charge" -- I think no one here is looking for that!), the time to do that would be now.

We were not in a rush in mid-January when we said "How about March 1?". I don't object to saying "OK, how about March 15?", but there was no suggestion of "Well, I think we need a little more time". But, I think if we change the date to March 15 or April 1 or whatever, then we really ought to mean it. ;)

For names, I appreciate that "float" implies "64-bit Floating Point", which might be the dominant use-case but limiting. Still, "UFloat" or "VFloat" seems okay to me, but perhaps "UValue" or "uvalue" would be more general.

I like "errorprop" (bonus: not already taken in PyPI!). Maybe

from errorprop import UValue

(I would suggest supporting UFloat for back compatibility, and)

I would be -0.5 on "randvar" as "random" typically implies "stochastic, not predictable, as with a random number generator" in the numpy/scipy world, not "continuously varying quantity that can take any value" as in statistics. That is, the values are not "unpredictable", but rather have a nominal (most likely) value and a quantifiable uncertainty.

jagerber48 commented 5 months ago

Ah, I see. I wasn't suggesting any critical conversation go on in private, just attempts to urge him to give us some direction in this issue if he's available. If we had already exhausted contacts then I feel no need to extend the date further beyond March 1 that we came up with earlier.

Maybe UFloat could be a subclass of UValue in such a way that the error propagation logic is mostly handled in the UValue but UFloat holds two floats (a value and an uncertainty) and UValue hooks into the underlying float __add__ etc. methods to do error propagation. Then if someone wants a UDecimal in the future that could be implemented. Alternatively, stick with UFloat for now because it has already proven to be very useful, and extend as I described above only if demand arises for something like UDecimal.

like UFloat, I don't think we should get rid of it, was just raising a con for that being the name of the overall package. Though I will have to get used to UFloat instead of ufloat. Need to think about how I feel about VFloat.

andrewgsavage commented 5 months ago

I like the name errorprop too.

I don't object to extending the Mar 1 date, but also don't see any reason to extend it. Could we start making changes to lmfit-uncertainties_forked while still be open to a response from @lebigot ?

Could we open a pypi topic now? I presume it could take some time to get a response on what the next steps here are, given the ambiguity.

lebigot commented 5 months ago

I just read all the messages in this issue. I have been busy with some other important matters, for the last few months.

Thank you all for the interesting discussion and the support.

I'll respond to all the points tomorrow.

wshanks commented 5 months ago

The way I contacted @lebigot was via email. I used the address given in author_email in setup.py. I emailed him just now again and he responded here faster than I could :slightly_smiling_face:

Just to comment on a couple other things above:

The access I have right now is maintainer access, the ability to merge PRs and probably a few other things like editing someone's PR before merge or closing someone else's issue. What I don't have is the ability to change repository settings (add maintainers or change CI settings), publish to pypi, or update the docs. I think @newville was hoping for more sharing of those roles as well so that the project truly does not depend on one person for anything. That makes sense to me. If others hadn't stepped forward here, I would have tried the model of pushing package updates and prodding @lebigot to publish new releases.

If there is a fork (hopefully it won't be necessary), I like the name errorprop or some variation of that. I think it would be good to keep backwards compatibility in mind to start out, meaning that perhaps all a consumer would need to do is change import uncertainties to import errorprop, but it would be a good time to consider other breaking changes while there is a free pass to break things (user is choosing to change packages rather than just updating to a new version after the change). I just would be cautious about making the barrier too high for someone to migrate. The package version could be reset to 0.1.0 and not bumped to 1.0 until breaking changes were completed if some were identified as desirable but not ready at the outset of the fork.

newville commented 5 months ago

@lebigot @wshanks Thanks to both and great to hear from you. I think we are actually in no real rush and would much rather migrate the repository with care and with general (ideally even "universal") agreement on how to proceed.

lebigot commented 5 months ago

I am very happy to see that this package has reasons, and the means to continue to live! Thank you to all of you.

Passing the torch

I am fully ready to have the full stewardship of uncertainties be transferred, and to assist with the operation (as I only used the package myself once or twice, and that was more than 10 years ago!)

I am also ready to help with its evolution; realistically, this will be when I feel that I can quickly give a short answer to a question where my experience with the package can have some real added value.

The consensus is that lmfit is a good GitHub Organization for uncertainties' new home, and I am sensitive to the quantitative argument about lmfit being the main single source of usage for uncertainties.

I understand from this thread that keeping the name uncertainties is a fine option. I like this, as this will make it easier for existing code bases. (I also must say that I like seeing my brainchild keep a more obvious link with its heritage.)

Making lmfit/uncertainties the main repository

I'm not yet fully sure how to make lmfit/uncertainties the new reference repository, but here are some thoughts.

lmfit/uncertainties is currently a fork of this repository. Again, I would like it if the origin of uncertainties is obvious (having lmfit/uncertainties be the official fork would make this clear). How important is it to change this? I was imagining that lmfit/uncertainties could be the official living place of the living version of uncertainties; I would point to it in the README.

Now, I'm not sure how issues attached to this repository can be managed conveniently in the context of switching to lmfit/uncertainties as the place where the package evolves. I'd be happy to hear what you know about this, so that we do a smooth transition.

Now, there is the question of package distributions, as mentioned earlier:

  1. PyPI: I'm ready to hand over the publication of uncertainties to PyPI. @andrewgsavage, you were saying that I can add people as maintainer or owner, right? Which people should be added?
  2. Conda: I don't have any specific rights associated with this, so I'm glad that you (@newville) seem to be confident that connecting with Conda should be manageable. My understanding has always been that the people responsible for the feedstocks monitor releases of uncertainties. With a move to lmfit, such a monitoring would break.
  3. Other package distributions (Python(x,y)…). I'm thinking that creating an official "last release" in this repository, with a message to package/distribution maintainers mentioning uncertainties' new home could be a way of raising awareness?

Planned evolutions

The lists of next steps in this thread seem good to me (first list, second list, ordered list).

I would add updating the documentation to Python 3 (print(), but also f-strings…). Notwithstanding new features, this is most user-visible part of the iceberg, so I would put this relatively high (but of course I'll let you choose your priorities).

Adding some information for contributors in the documentation (how to run tests, what conventions are used in the code, etc.) would also help: these are currently missing.

Since I was working alone, I was my own, human PEP linter. I'm thinking that when working with many people, a linter at commit time is a good idea: this allows the code to follow some standards in a relatively painless way.

I would add that it's also a good idea to follow PEP 257 (the docstring PEP)!

The code should be mostly PEP 8 and PEP 257 compliant, already. Now, black or ruff might have different opinions, but hopefully not too many.

User experience and features

As for which versions of Python to support, I would suggest to be as useful as possible (i.e. to support as many versions as possible) as long as it doesn't really complicate or slow down the code, or more importantly precludes some features to be implemented (like transparent NumPy array support requiring Python 3.5+).

As for sciform, I'm all for adding the engineering notation to uncertainties (or more printing options). Now, I strongly believe that users are better off if they can directly format numbers with uncertainties through f-strings (as is already the case with uncertainties), without having to call any ad hoc function.

On this subject, now that terminals handle special characters better, making "±" the default might be a good idea (but it's a breaking change).

I like the argument in favor of using UValue instead of UFloat (it's indeed more general).

Technical points

The reason for AffineScalarFunc to exist as a class name is that it emphasizes the fact that the it mostly represents simple affine functions (that can be combined together). Now, it also contains methods for managing uncertainties, which in principle could be separate. It may thus be cleaner to separate the two and have UValue inherit from AffineScalarFunc while adding in UValue methods for managing uncertainties. Now, this is not something urgent or even super important, but it may inform the decision to have one of the classes be an alias for the other.

Just out of curiosity, @jargerber48: how would supporting Python <3.6 impact anything in the string formatting code?

Thank you!

I am very grateful for everybody's implication with this project! Let's make it step into a second life together.

jagerber48 commented 5 months ago

@lebigot Thanks for your thorough response! I definitely have a breath of relief that we get some guidance from you and we'll be able to keep uncertainties close to its roots!

Re lmfit/uncertainties: I believe the lmfit/uncertainties fork was always a contingency in case we weren't able to get sufficient privileges to maintain this repo directly. Now that you've responded and are happy to help out transferring stewardship my opinion is that this repository should just continue to be the main repository with the main change being a likely major release to version 4.0. That way all of the history, issues, discussion, releases etc. will stay in tact. Curious if others agree with this.

Some responses on sciform. I'm going to put these responses here, but I don't want to de-rail this thread too much with technical details since I think this thread is more importantly about the transfer of stewardship of uncertainties. I think I'll open up a discussion topic about the possibility of moving to sciform as a formatting backend for uncertainties where details can really be discussed by anyone interested.

As for sciform, I'm all for adding the engineering notation to uncertainties (or more printing options). Now, I strongly believe that users are better off if they can directly format numbers with uncertainties through f-strings (as is already the case with uncertainties), without having to call any ad hoc function.

Yes, I love the f-string formatting in uncertainties. I was blown away by the idea of custom formatting when I first used ufloat and saw I could format it like normal floats but with more options. The uncertainties f-string formatting heavily motivated/inspired me to write sciform. I agree that UFloat or UValue should continue to support f-string formatting. sciform will be able to support this with no trouble, but sciform's format specification mini-language has differences from both python's built-in and theuncertainties format specification mini-languages that need to be discussed.

On this subject, now that terminals handle special characters better, making "±" the default might be a good idea (but it's a breaking change).

Yes, I thought about this for sciform. See https://github.com/jagerber48/sciform/discussions/10. I actually decided to totally drop the ASCII +/- symbol in favor of the unicode ± symbol and even dropped the option to switch between the two, at least until such time as someone requests an option for the ASCII +/- symbol. (There is still a post-facto way to convert sciform output to ASCII-compatible characters only if someone really has a terminal that can't display unicode)

Just out of curiosity, @jargerber48: how would supporting Python <3.6 impact anything in the string formatting code?

I don't think support Python < 3.6 would affect the formatting code that users have access to. The main issues I see are

I think it should also be explored what typing features might be precluded by trying to include python versions < 3.7 or 3.8 or so.

jagerber48 commented 5 months ago

https://github.com/lebigot/uncertainties/discussions/192 Here is a discussion dedicated to discussing adoption of sciform by uncertainties as a formatting backend.

andrewgsavage commented 5 months ago

I'm not yet fully sure how to make lmfit/uncertainties the new reference repository, but here are some thoughts.

lmfit/uncertainties is currently a fork of this repository. Again, I would like it if the origin of uncertainties is obvious (having lmfit/uncertainties be the official fork would make this clear). How important is it to change this? I was imagining that lmfit/uncertainties could be the official living place of the living version of uncertainties; I would point to it in the README.

Now, I'm not sure how issues attached to this repository can be managed conveniently in the context of switching to lmfit/uncertainties as the place where the package evolves. I'd be happy to hear what you know about this, so that we do a smooth transition.

There are instructions on transferring a repository at https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository

  1. I think @newville will first need to delete the lmfit/uncertainties_forked repo as I don't have privilages at https://github.com/lmfit/uncertainties_forked/settings
  2. @lebigot can then follow the steps to transfer the repo

Now, there is the question of package distributions, as mentioned earlier:

  1. PyPI: I'm ready to hand over the publication of uncertainties to PyPI. @andrewgsavage, you were saying that I can add people as maintainer or owner, right? Which people should be added?

From https://pypi.org/help/#collaborator-roles

Maintainer: Can upload releases for a package. Cannot add collaborators. Cannot delete files, releases, or the project.

Owner: Can upload releases. Can add other collaborators. Can delete files, releases, or the entire project.

I'd add @newville as owner, and add @andrewgsavage @wshanks, and @jagerber48 as maintainers for the time being

As for which versions of Python to support, I would suggest to be as useful as possible (i.e. to support as many versions as possible) as long as it doesn't really complicate or slow down the code, or more importantly precludes some features to be implemented (like transparent NumPy array support requiring Python 3.5+).

I think there is limited value to supporting python <3.8 as 3.7 is at end-of-life. I agree with @jagerber48's arguements for >=3.6. There was mention of typing in this issue, which requires more recent python versions (3.7?). Mostly I'd like to prioritise reducing development time, as there's plenty of improvments we'd like to do and ensuring compatability with earlier versions hinders this. Therefore I propose supporting >=3.8 (as is tested in https://github.com/lebigot/uncertainties/pull/191 ), and if anyone raises an issue about needing earlier versions we can revisit this.

edited to add @wshanks

wshanks commented 5 months ago

Conda: I don't have any specific rights associated with this, so I'm glad that you (@newville) seem to be confident that https://github.com/lebigot/uncertainties/issues/180#issuecomment-1889966372. My understanding has always been that the people responsible for the feedstocks monitor releases of uncertainties. With a move to lmfit, such a monitoring would break.

The conda-forge feedstock pulls from the pypi sdist rather than from GitHub so it won't break no matter what happens with the GitHub repo. Also, if the repo gets transferred into the lmfit organization, github.com/lebigot/uncertainties will redirect to github.com/lmfit/uncertainties as long as a new github.com/lebigot/uncertainties repo is not created. So anything pulling from the current repo url should keep working.

Transferring the repo to the lmfit organization like @andrewgsavage outlined would be the best course of action in my opinion for preserving history and maintaing url redirects like I mentioned above, while still moving the repo into an organization which is more conducive to shared ownership than keeping the repo under lebigot/uncertainties. One note for @lebigot -- you might want to fork the lmfit/uncertainties repo after the transfer in order to follow the standard practice of opening PRs against it from a personal fork, but you should be careful to change the name of your fork to something like lebigot/uncertainties-fork because if you use the default name of lebigot/uncertainties that will break the redirect from lebigot/uncertainties to lmfit/uncertainties.

I agree that I see value in only supporting Python versions that are supported by the CPython team itself. numpy for example stopped supporting 3.8 in 1.25.0 released in June 2023, so well before the end of life of Python 3.8 in October 2024. If you are using a version of Python released 6 years ago or more you probably don't expect the latest features and you can just keep using the last version released that did support the version of Python you are using. I could see an argument for backporting bug fixes to patch releases of the version of uncertainties that supported a particular version of Python if feasible, but the current code base has stayed useful despite no significant changes in the past two years, so this might be the kind of project that doesn't need much backporting of fixes. I would not break older versions unnecessarily but it is nice to be able to make use of new features in Python and numpy (f-strings, assignment expressions, match statements, typing, etc).

Regarding PyPI, similar to moving the repo under the lmfit GitHub organization, it may make sense to create an lmfit PyPI organization and put uncertainties under that. PyPI organizations are still in beta at the moment though but PyPI started accepting applications to try them almost a year ago. I think the PyPI access shouldn't matter too much because likely the best course is to set up triggering publishing releases from GitHub CI and then GitHub privileges for adding a tag would gate PyPI releases (though some people need PyPI access in case settings need to be changed).

newville commented 5 months ago

@lebigot Thanks! and thanks for the helpful replies from @wshanks @andrewgsavage and @jagerber48

As @andrewgsavage says, to transfer the repo, you would go into the "settings" for your repo, then down to the "Danger Zone" and then select Transfer. You should be able to select transferring it to lmfit/uncertainties -- if anything seems uncomfortable, wrong, or strange, let me know and we can try to work out what's going wrong. We removed lmfit/uncertainties (but do have lmfit/uncertainties_fork) so that this could be done seamlessly.

More about transferring a repository is at https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository

To be clear, PRs, stars and wiki are included in a transfer. The docs say that issues are included, but then also says that issues from people outside of the new organization get lost. I recommend that we copy all the Issues and Pull requests (including closed ones) for the historical record before transferring. I will try to do that this weekend and put them somewhere (maybe the wiki?). I think all the webhooks get moved too, but I think we'll want to revisit the CI tools anyway.

Yes to multiple people for PyPI. I like the suggestion of an lmfit organization, and will look into that.

I think all other package management systems will work themselves out. I have not looked at who is maintaining the conda-forge feedstock, but I think that if we can push to PyPI, anything else can use that or Github release.

For Python versions: In my opinion, f-strings and (maybe not vital here) dictionaries being ordered make supporting "Python < 3.7" painful. I agree with @andrewgsavage that we officially support (and test) only the supported Python version (currently, 3.8 through 3.12). But we can have a policy that changes made that would intentionally break support for a recent end-of-life version (currently, 3.7) really ought to have a discussion. I don't see many recent features that are likely to impact this library.

I think it is also reasonable to try to add testing for new versions when they reach the beta stage. But, we can also be forgiving if we slip a bit behind, and rely on whatever CI we use (a fair topic for discussion). I suspect that a) new versions will generally work and b) adoption of the "latest Python version" is rather slow in scientific/data science work where lots of support libraries are probably needed and have bigger challenges than this library.

Thanks!

wshanks commented 5 months ago

You should be able to select transferring it to lmfit/uncertainties -- if anything seems uncomfortable, wrong, or strange, let me know and we can try to work out what's going wrong.

From the transfer documentation, it sounds like you might not be able to transfer to an account that has a fork of the repo being transferred, so lmfit_uncertainties_forked might need to modified (deleted, transferred out, or unlinked as a fork of lebigot/uncertainties):

The target account must not have a repository with the same name, or a fork in the same network.

When I have worked with transferred repositories before, all the issues were transferred but it doesn't hut to be careful.

f-strings and (maybe not vital here) dictionaries being ordered

For what it's worth, 3.6 has both of these (dictionary ordering was not officially documented in 3.6 but was in place), but it sounds like I am less concerned about supporting old versions than some others here (I don't mind supporting older versions, but I just don't feel it is worth too much extra effort). Personally I am still upgrading projects from 3.8 with the upcoming end of life, but for those still on 3.8 I don't have an expectation that I should be getting the latest features of the libraries I am using.

newville commented 5 months ago

@wshanks

The target account must not have a repository with the same name, or a fork in the same network. When I have worked with transferred repositories before, all the issues were transferred but it doesn't hut to be careful.

Thanks -- that's a good point. There was nothing in lmfit_uncertainties_forked worth keeping, so I deleted it.

FWIW, I did apply for an organization on PyPI named "lmfit".

For Python versions: I am in favor of supporting and testing with only supported Python versions, currently Python 3.8 to 3.12. Let's plan to stop testing 3.8 later this year (I think Oct 2024).

What I would say though is that we may want to discuss and document when we intentionally break something for an old version. For example, we may someday decide that we want to use match statements, and then we can state that code from there on won't work with Python 3.9 or earlier. Maybe we can strive for a policy goal that doing something like adding match before "EOL for Python3.9 + 1 year" really ought to be discussed and documented. Would that be OK?

jagerber48 commented 5 months ago

I missed a point: What is the advantage to using the lmfit fork over continuing to use this current repo? Is the issue that this repo is in the lebigot namespace so that user will always have "special" privileges with respect to the repo, whereas in an organization multiple users can have privileges on more equal footing? Is it not possible for lebigot to extend important privileges (merging PRs, making releases, etc.) to other users in the current repo?

wshanks commented 5 months ago

I don't think anyone is advocating for using the fork now. We have been talking about moving this repo into the lmfit organization. It should keep the continuity of the issues/prs/urls but more clearly indicate that project is maintained by the lmfit team now rather than by a single person. I think functionally it is possible to grant full ownership to other people to a personal repository but moving it under an organization makes it a little easier to manage shared access rights and makes the shared maintenance more clear. (I think GitHub makes it hard to tell who owns and has access to a repository intentionally for security reasons).

newville commented 5 months ago

@jagerber48 Yes, see https://docs.github.com/en/organizations

Quoting from https://docs.github.com/en/organizations/collaborating-with-groups-in-organizations/best-practices-for-organizations#assign-multiple-owners:

If an organization only has one owner, the organization's projects can become inaccessible if the owner is unreachable. To ensure that no one will lose access to a project, we recommend that at least two people within each organization have the owner role. For more information, see "Maintaining ownership continuity for your organization."

All the people in this conversation (@andrewgsavage @wshanks @jagerber48 @lebigot @newville) are already members of the lmfit organization. I am one of 4 Lmfit owners, and we add anyone here to be an owner too.

jagerber48 commented 5 months ago

Got it, makes sense, thanks both for the clarification!