scikit-hep / pyBumpHunter

Python implementation of the BumpHunter algorithm used by HEP community.
BSD 3-Clause "New" or "Revised" License
17 stars 9 forks source link

Naming conventions, pep8 and a few ideas of improvements #3

Open jonas-eschle opened 3 years ago

jonas-eschle commented 3 years ago

Hi all, welcome to the Scikit-HEP world! Good to see this up now. I've noticed a few points, all fine but just to improve the repository and get it up to the standards.

I looked through it and I've noticed that many names do not comply with the Python conventions as they are in CamelCase instead of snakecase. Is there any reason to deviate from this convention?

Docstrings should comply to PEP257.

Also, the code does not quite seem to be PEP8 compliant, I would propose to format it with black. There are many things I've spotted, e.g: if(cond) is not Python, the syntax should rather be if cond:.

There seems to be tests missing for BumpHunter2D, it would be good to have at least a simple test for anything public.

Furthermore, I think it would be great to refactor the two classes in a way to share the methods or somehow the interface.

On the methods: I would make the plot and print methods more flexible

In general, I would suggest the mindset to give the user as much control (and work) as possible; we don't need to create "do everything at once" blackboxes with dozens of arguments. This includes things such as setting a global seed in numpy. Or make the scan method _scan, not __scan.

Overall I see that there is a lot of nice stuff here, but I also think that the API could have an overhaul with a few breaking changes.

Btw, it did not work out to put it into hepstats?

Very nice to see the contribution and I hope the comments above can help the package to evolve further

lovaslin commented 3 years ago

Hello,

Thank you for your advice. I plan to work on naming convention and code improvement in near future. I also plan to change the way data are managed in future versions. This will change a few things, like how much control the user have or the number of arguments.

I will work on that for the next release.

eduardo-rodrigues commented 3 years ago

+1 on this. Note that A LOT of guidelines is well documented at https://scikit-hep.org/developer. Even style is discussed ... and you can look at the other org packages as these days the packages have become more and more homogeneous is terms of conventions and tools used :-).

jonas-eschle commented 3 years ago

I plan to work on naming convention and code improvement in near future. I also plan to change the way data are managed in future versions. This will change a few things, like how much control the user have or the number of arguments.

I will work on that for the next release.

That sounds good! I would even propose to remove the current release (or is there any need for an immediate release now?) and rework the API so that users don't need to change the API a few weeks later. It's just better for the user to have a more stable API. What do you think?

Btw about hepstats, did that not work out to contribute it there? Just wondering, it seems from my naive point of view to be a good fit there.

lovaslin commented 3 years ago

About the last release, since some people are currently using the previous version of pyBumpHunter in their work, I prefer keeping it so they have the latest feature already available without changing their current workflow. I prefer release the new reworked API in a future major release (version 1.0.0?) since it will probably break retro-compatibility.

About hepstats, I was asked if I wanted to include pyBumpHunter there before making the first release, but I chose not to (at least for now). Maybe in the future, once things get mature enough, pyBumpHunter could be merged into hepstats

De: "Jonas Eschle" @.> À: "scikit-hep/pyBumpHunter" @.> Cc: "Louis VASLIN" @.>, "Comment" @.> Envoyé: Lundi 22 Mars 2021 13:58:30 Objet: Re: [scikit-hep/pyBumpHunter] Naming conventions, pep8 and a few ideas of improvements (#3)

I plan to work on naming convention and code improvement in near future. I also plan to change the way data are managed in future versions. This will change a few things, like how much control the user have or the number of arguments.

I will work on that for the next release.

That sounds good! I would even propose to remove the current release (or is there any need for an immediate release now?) and rework the API so that users don't need to change the API a few weeks later. It's just better for the user to have a more stable API. What do you think?

Btw about hepstats, did that not work out to contribute it there? Just wondering, it seems from my naive point of view to be a good fit there.

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/scikit-hep/pyBumpHunter/issues/3#issuecomment-804040565 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/APPWEUZDPLLYZ4PCCM4VWOLTE45HNANCNFSM4ZS6YV6Q | unsubscribe ] .

jonas-eschle commented 3 years ago

About the last release, since some people are currently using the previous version of pyBumpHunter in their work, I prefer keeping it so they have the latest feature already available without changing their current workflow.

I see, that makes sense!

I prefer release the new reworked API in a future major release (version 1.0.0?) since it will probably break retro-compatibility.

I would propose to do a middleway: add the new API ASAP and deprecate the old API. Then, on 1.0, the old one can be removed. This includes renaming variables etc. the earlier the better. The problem with doing it later is that new users now do get used to it otherwise.

About hepstats, I was asked if I wanted to include pyBumpHunter there before making the first release, but I chose not to (at least for now). Maybe in the future, once things get mature enough, pyBumpHunter could be merged into hepstats

I see, it is just more work to have an independent package (as you see), but glad if you're up for it.

For the changes, I would prefer to have them rather earlier than later, in order to adhere to the Scikit-HEP standard. It is then also easier for us other devs to help with things.

If you are fine with it, I will make a PR that contains some of the changes mentioned above to get you started, okay?

eduardo-rodrigues commented 3 years ago

FWIW I agree with these comments :-).

lovaslin commented 3 years ago

Your help is very welcome, thank you ! Feel free to make your PR whenever you want/can. I will also work on my side on improving my code.

De: "Jonas Eschle" @.> À: "scikit-hep/pyBumpHunter" @.> Cc: "Louis VASLIN" @.>, "Comment" @.> Envoyé: Lundi 22 Mars 2021 14:56:41 Objet: Re: [scikit-hep/pyBumpHunter] Naming conventions, pep8 and a few ideas of improvements (#3)

About the last release, since some people are currently using the previous version of pyBumpHunter in their work, I prefer keeping it so they have the latest feature already available without changing their current workflow.

I see, that makes sense! BQ_BEGIN

I prefer release the new reworked API in a future major release (version 1.0.0?) since it will probably break retro-compatibility. BQ_END

I would propose to do a middleway: add the new API ASAP and deprecate the old API. Then, on 1.0, the old one can be removed. This includes renaming variables etc. the earlier the better. The problem with doing it later is that new users now do get used to it otherwise. BQ_BEGIN

About hepstats, I was asked if I wanted to include pyBumpHunter there before making the first release, but I chose not to (at least for now). Maybe in the future, once things get mature enough, pyBumpHunter could be merged into hepstats BQ_END

I see, it is just more work to have an independent package (as you see), but glad if you're up for it.

For the changes, I would prefer to have them rather earlier than later, in order to adhere to the Scikit-HEP standard. It is then also easier for us other devs to help with things.

If you are fine with it, I will make a PR that contains some of the changes mentioned above to get you started, okay?

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/scikit-hep/pyBumpHunter/issues/3#issuecomment-804081325 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/APPWEU4SYUILQWCCIUOBW73TE5EBTANCNFSM4ZS6YV6Q | unsubscribe ] .

henryiii commented 3 years ago

Great to see this in!

If you need help, feel free to poke me.

A few suggestions: if you install cookiecutter, you can do the following in a temporary directory cookiecutter gh:scikit-hep/cookie, then answer the questions. Then compare the new project it creates with yours - you should be able to copy in parts, like the .pre-commit-config.yaml file and more. It’s a fast way to get a “scikit-hep developer guidelines” following project going.

I’d also make the minimum Python version 3.6, as 3.5 has been dead for a little while and is really rare, especially in scientific computing, and 3.6 gives you some really nice features like f-strings and variable annotations. In fact, since the rest of the scientific stack has dropped 3.6, maybe 3.7+ would be fine for you (see the Python statement on scikit-hep.org http://scikit-hep.org/). 3.7+ makes static typing drastically easier with from future import annotations (which basically gives you all future improvements in 3.x to typing for almost free) and PEP 560 support (which is bugging me on boost-histogram, which is 3.6+ but I could really use get_origin / get_args…).

I think it would be a good idea if pybumphunter supported the PlottableHistogram Protocol histograms from boost-histogram/uproot4/hist and specified in UHI. We don’t really have a fitting-protocol yet, but the Plottable Protocol probably covers everything you need? (And if not, I’d like to hear about it!)

On API changes, 0.x -> 0.x+1 is considered a major change, so I think it’s fine to have a one or two version deprecation period, and then drop the old API without forcing a 1.0 release. Once you hit 1.0, then you shouldn’t (intentionally) break old code until bumping the major version (for normal SemVer). A lot of projects (Python, NumPy, etc) use a modified SemVer were minor releases can drop items deprecated for the last N minor releases. (N is 3 for NumPy, I believe).

Cheers, Henry

On Mar 22, 2021, at 10:05 AM, Louis VASLIN @.***> wrote:

Your help is very welcome, thank you ! Feel free to make your PR whenever you want/can. I will also work on my side on improving my code.

De: "Jonas Eschle" @.> À: "scikit-hep/pyBumpHunter" @.> Cc: "Louis VASLIN" @.>, "Comment" @.> Envoyé: Lundi 22 Mars 2021 14:56:41 Objet: Re: [scikit-hep/pyBumpHunter] Naming conventions, pep8 and a few ideas of improvements (#3)

About the last release, since some people are currently using the previous version of pyBumpHunter in their work, I prefer keeping it so they have the latest feature already available without changing their current workflow.

I see, that makes sense! BQ_BEGIN

I prefer release the new reworked API in a future major release (version 1.0.0?) since it will probably break retro-compatibility. BQ_END

I would propose to do a middleway: add the new API ASAP and deprecate the old API. Then, on 1.0, the old one can be removed. This includes renaming variables etc. the earlier the better. The problem with doing it later is that new users now do get used to it otherwise. BQ_BEGIN

About hepstats, I was asked if I wanted to include pyBumpHunter there before making the first release, but I chose not to (at least for now). Maybe in the future, once things get mature enough, pyBumpHunter could be merged into hepstats BQ_END

I see, it is just more work to have an independent package (as you see), but glad if you're up for it.

For the changes, I would prefer to have them rather earlier than later, in order to adhere to the Scikit-HEP standard. It is then also easier for us other devs to help with things.

If you are fine with it, I will make a PR that contains some of the changes mentioned above to get you started, okay?

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/scikit-hep/pyBumpHunter/issues/3#issuecomment-804081325 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/APPWEU4SYUILQWCCIUOBW73TE5EBTANCNFSM4ZS6YV6Q | unsubscribe ] .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-hep/pyBumpHunter/issues/3#issuecomment-804089029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDHFSQBTLZG4CRDSCZZG5TTE5FENANCNFSM4ZS6YV6Q.