matplo / pyjetty

some work on jets
3 stars 14 forks source link

Automatic determination of reg param #17

Closed ezradlesser closed 4 years ago

ezradlesser commented 4 years ago

Has not been tested with grooming -- FYI there may be some inconsistencies with obs_setting and obs_label vars in run_analysis.py which should be tested further

ezradlesser commented 4 years ago

@jdmulligan

  • I think we should keep the ability to specify the regularization parameter manually -- e.g. take the manual value if one is specified in the config, otherwise determine it automatically (or any other implementation you prefer)

I guess this is reasonable -- though I'm not sure why someone would want to do this. There isn't really a slow-down, but this can probably be changed so that a flag is passed or something in the case that someone wants to pass regularization parameters manually.

  • There are a huge amount of whitespace changes and line breaks, etc. I'm not sure if this is intentional or not -- make sure to check the "Files changed" tab on your PR before making it and check through all of the files (the shared files at least) to make sure they include only what you need. If you in fact intentionally made all these changes, you should rather propose a formatting style you prefer and we can run a code-format commit (but it is not good to mix these with other changes).

Yes, these are intentional. In general there are many lines of code in these files which are much too long. In many code editors this means that the reader either has to scroll horizontally to read the code, or the lines break at the end of the screen which causes somewhat random rollovers to the other edge of the screen, in either case making the code very difficult to read. I think we should follow the PEP8 guidelines here: https://www.python.org/dev/peps/pep-0008/#maximum-line-length

Another issue is that most empty lines were filled with trailing white space, which I corrected here. We should also follow the PEP8 guidelines regarding 4-space indentations instead of 2 (which are much easier to follow & read), but I have not corrected this here. https://www.python.org/dev/peps/pep-0008/#indentation

There are probably some other whitespace issues which I made / didn't fix, as I was mostly just trying to make the code readable.

jdmulligan commented 4 years ago

Hi @ezradlesser

The reason to keep a manual option is twofold: (1) In general when adding new features that haven't been extensively tested it is best to keep flexibility with the old option, since there might arise cases where the automated algorithm does undesirable things (2) As you discussed the other day, it is not clear yet what will happen if e.g. there is a spike in uncertainty in a particular bin or two -- we may want to have manual control in some cases. It should take just a few lines of code to keep that option, since the functionality is already there to read the reg param from the config file (an easy way is to default to the automatic method, but check if a manual value is given in the config file and use that if so -- I think that is really <5 lines). So I would propose that you add that backwards compatibility, then I'll merge the PR.

Regarding the code formatting, indeed it would be good to have some agreed standards (some you mention are pretty much universally agreed on, like line-length, others are more subjective, like indentation spacing). Thanks for some initiative on this -- but we should go about it differently than mixing in a large amount of format changes in other PRs. In particular: We should use a code formatting tool (e.g. black) and do a universal commit across all of the files. And importantly: we should agree beforehand to adopt (and maintain) a certain convention -- code formatting is 90% communication. For this time it is ok but going forward if you want to make more than one or two formatting changes, we should do it as above.

ezradlesser commented 4 years ago

@jdmulligan One should now be able to bypass the max_reg_param implementation and use the previous implementation by just not having the max_reg_param in the config file.

jdmulligan commented 4 years ago

Great, thanks!!