scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 83 forks source link

Sort contributions to release annotated tags #1848

Open matthewfeickert opened 2 years ago

matthewfeickert commented 2 years ago

Following up from PR #1846, @kratsg has suggested that we sort the contributions that show up the annotated tags for releases. This seems like a pretty good idea to get a clear, high level view pretty fast.

He has the following snippet

sort < log.txt | gsed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | gsed 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g'

— that should be applied to the output that you get following

https://github.com/scikit-hep/pyhf/blob/6452cc62985dfb0d18e997e05211cfcba6c9d210/.github/workflows/bump-version.yml#L219-L222

— which can take input like

Full input: ``` - docs: Add general use citation from Survey of Open Data Concepts paper (#1844) - docs: Add general use citation from End-to-End Optimization paper (#1843) - docs: Add general use citation from SimpleAnalysis ATLAS PUB note (#1842) - fix: Override error on filterwarnings to pass notebook tests (#1841) - chore: [pre-commit.ci] pre-commit autoupdate (#1839) - revert: Remove Jinja2 restrictions given nbconvert v6.4.5 (#1837) - ci: Add concurrency group to HEAD of dependencies workflow (#1836) - ci: Add matplotlib nightly wheels to HEAD of dependencies testing (#1835) - ci: Update GitHub Actions to next stable version (#1833) - fix: Add filterwarnings ignore for Pillow DeprecationWarning (#1834) - ci: Use actions/setup-python v3 (#1828) - fix: bump black to 22.3.0 due to click 8.1 release (#1827) - docs: Add JupyterLite REPL for interactive pyhf in docs (#1820) - fix: Disallow Jinja2 v3.1.0 to avoid nbsphinx triggering attribute error (#1824) - test: Assert exported StatError has no name attribute (#1821) - feat: Add contextlib support to pyhf.schema API (#1818) - fix: writexml handles missing parameter configs for normfactor (#1819) - ci: Report coverage for oldest and newest Python tested (#1817) - feat: Alternative Schema Locations (#1753) - docs: Update JAX backend normal docstring to jax v0.3.2 returns (#1813) - docs: Add use citation from neos paper (#1812) - build: Add support for Python 3.10 across all backends (#1809) - ci: Add CPython 3.10 to testing (#1808) - refactor: Clarify exception message applies only to profile likelihood ratio (#1806) - chore: [pre-commit.ci] pre-commit autoupdate (#1805) - docs: Fix tiny typo in MC Stat Error documentation (#1803) - docs: Add use citation from ATLAS UEH displaced jets CalRatio paper (#1802) - docs: Correct Giordon's affiliation to SCIPP in CITATON.cff (#1801) - docs: Add use citation from ATLAS UEH MS displaced jet paper (#1800) - docs: Update Lukas's affiliation to Technical University of Munich (#1798) - docs: Add general citation from MadJAX paper (#1799) - feat: Use tbump over bump2version (#1790) - docs: Add section for tutorial and docs to README (#1789) - build: Remove wheel and attrs from build-system requires (#1788) - build: Require setuptools v42.0.0+ for stability (#1783) - docs: Add citation from 'HL-LHC Computing Review Stage 2' paper (#1779) - docs: Fix link to TRExFitter documentation (#1777) - test: Consolidate and update pytest options in pyproject.toml (#1773) - test: Add html coverage reports from pytest (#1771) - chore: [pre-commit.ci] pre-commit autoupdate (#1765) - docs: Update scipy intersphinx url to drop 'reference' (#1767) - ci: Add macos-latest to dependency release candidates testing (#1760) - test: Avoid tensorflow macOS floating point deviation with pytest.approx (#1761) - chore: Update black to first stable release v22.1.0 (#1754) - ci: Launch tmate session if pytest fails on workflow dispatch run (#1748) - fix: Accept ValueError for JAX backend `tolist` fallback (#1746) - docs: Add use citation from revisiting mono-tau tails at the LHC paper (#1744) - ci: Update gh-action-pypi-publish to use print_hash (#1743) - ci: Limit concurrent workflow jobs to one per workflow per branch (#1632) - docs: Note shapesys and staterror modifier set to 1 for modifier data of 0 (#1740) - chore: [pre-commit.ci] pre-commit autoupdate (#1741) - fix: Disallow nbsphinx v0.8.8 to avoid empty "raw" directive bug (#1742) - docs: Add milestone for 1000 project commits to README (#1739) - docs: Update citation references publication status (#1738) - docs: Add GitHub Release Radar check to release checklist (#1733) - docs: Truncate floating point docstring examples to 8 decimal places (#1726) - ci: Publish to TestPyPI on tag or by workflow dispatch trigger (#1727) - test: Make fail_backend markers add pytest.mark.xfail and remove fail_jax marker on percentile tests (#1730) - feat: Allow skipping validation when constructing workspaces (#1710) - docs: Fix download method of probability models archive in pull plot notebook (#1724) - chore: [pre-commit.ci] pre-commit autoupdate (#1723) - docs: Fix download method of probability models archive in impact plot notebook (#1721) - chore: Use constraints.txt for lower bound testing (#1713) - fix: Skip doctest of `pyhf.contrib.utils.download` (#1715) - test: Use scikit-hep-testdata to provide probability models for regression tests (#1711) - feat: Catch unexpected keyword arguments in workspace construction (#1709) - feat: Raise exception if bin-wise modifier data length doesn't match sample data (#1708) - ci: Quote GitHub Action python-version number as YAML strings (#1707) - fix: Ensure `_ModelConfig.suggested_fixed` list contains only booleans for all modifiers (#1706) - fix: Accept tar and zip headers in contrib.utils.download requests (#1704) - refactor: Make contrib.utils.download robust to archive file types (#1697) - docs: Ensure docstring examples are contiguous (#1703) - build: Set only lower bounds on backend dependencies (#1698) - test: Use xfail for tests that fail for upstream problems (#1702) - refactor: Use tensorlib.percentile in calculators (#1694) - refactor: Use jax.numpy for JAX backend tensorlib.tolist (#1138) - feat: Add transpose function to tensorlib (#1696) - docs: only lists are accepted when specifying objects to prune (#1692) - feat: Add percentile function to tensorlib (#817) - fix: Speed-up readxml by caching key lookup instead of using try/except (#1691) - docs: Add ATLAS third-generation scalar leptoquarks search statistical model record (#1682) - docs: Add ATLAS top group probability model records through June 2021 (#1681) - fix: Use https protocol as unauthenticated git protocol is no longer supported (#1680) - refactor: Pass `Accept` header to requests in `contrib.utils.download` (#1673) - chore: [pre-commit.ci] pre-commit autoupdate (#1679) - docs: Update 2021 published ATLAS probability models (#1671) - feat: Remove pyhf.simplemodels.hepdata_like from API (#1670) - build: Update lower bound on jax to v0.2.10 (#1666) - build: Set lower bound of scipy v1.1.0 (#1661) - ci: Turn off PyPI release tests on pull requests (#1664) - refactor: Simplified parameters (#1639) - feat: Configurable default backend (#1646) - ci: Add release candidates to HEAD of dependencies workflow (#1660) - feat: Allow zero rate Poisson (#1657) - test: Remove 'src' from pytest test testpaths to allow for non-editable install in CI (#1467) - feat: Add support for arrayful JSON (#1647) - test: Use netlocs that are known to not exist or give known return (#1651) - fix: custom modifier / new parameter support and test (#1644) - docs: Add and apply codespell as a pre-commit hook (#1645) - feat: Add POI-less specification support (#1638) - fix: Fix bug in impact plot visualization (#1642) - feat: Allow POI-less models via Workspace.model (#1636) - feat: Add setup for custom modifiers (#1625) - ci: Add absolufy-imports pre-commit hook (#1635) - feat: Expose fitted parameter values of implicit fits in test statistic calls (#1554) - feat: Add hypotest kwargs to pyhf.infer.intervals.upperlimit (#1613) - ci: Report coverage to Codecov without token (#1628) - ci: Update codecov-action to v2 API (#1623) - ci: Allow reporting of coverage on PRs from forks (#1622) - docs: Use sphinx-copybutton prompt regex to fully capture examples (#1617) - chore: [pre-commit.ci] pre-commit autoupdate (#1616) - docs: Use sphinxcontrib-bibtex style 'unsrt' to sort citations in reverse chronological order (#1615) - docs: Add use citation from simplified likelihoods ATLAS PUB note (#1614) - fix: Use MLEs of NPs to create sampling distributions in ToyCalculator (#1610) - docs: Add use citation from collider signatures of coannihilating dark matter paper (#1604) - docs: Add `uproot4` writing speedup to v0.6.3 release notes (#1601) - docs: Add use citation from publishing statistical models white paper (#1600) - ci: Use `jupyter-black` pre-commit hook over `nbqa-black` (#1598) - docs: Update maintainer release checklist with v0.6.3 notes (#1597) - ci: Add Python 3.9 to 'Current Release' workflow tests (#1596) - docs: Correct v0.6.3 release notes to note `pyhf.pdf._ModelConfig.channels` is a list (#1592) - chore: [pre-commit.ci] pre-commit autoupdate (#1593) - ci: Skip doctest for 'Minimum supported dependencies' workflow (#1589) - fix: Update notebooks to use `include_auxdata` kwarg for `pyhf.Workspace.data` (#1588) ```

and output

chore:
    [pre-commit.ci] pre-commit autoupdate (#1839)
ci: 
    Add concurrency group to HEAD of dependencies workflow (#1836)
    Add matplotlib nightly wheels to HEAD of dependencies testing (#1835)
    Update GitHub Actions to next stable version (#1833)
    Use actions/setup-python v3 (#1828)
docs: 
    Add JupyterLite REPL for interactive pyhf in docs (#1820)
    Add general use citation from End-to-End Optimization paper (#1843)
    Add general use citation from SimpleAnalysis ATLAS PUB note (#1842)
    Add general use citation from Survey of Open Data Concepts paper (#1844)
fix: 
    Add filterwarnings ignore for Pillow DeprecationWarning (#1834)
    Override error on filterwarnings to pass notebook tests (#1841)
    bump black to 22.3.0 due to click 8.1 release (#1827)
revert:
    Remove Jinja2 restrictions given nbconvert v6.4.5 (#1837)

It is probably worth splitting this out across multiple lines in .github/workflows/bump-version.yml to make it more understandable and easier to maintain, but having it as a 1 liner is some good code golf! :)

His summary is:

the first sed here is doing something roughly...

  • bookmark start of the line that matches (action): (stuff)\n(action): (other stuff)
  • replace the matched line with (action): (stuff)(other stuff)
  • go back to bookmark and repeat again

second sed here is doing something like

  • replace (action): (stuff) on same line with (action):\n(other stuff)
  • replace + with \n
matthewfeickert commented 2 years ago

@kratsg can you also lengthen this into something that is more understandable and maintainable? There are additionally differences between gsed and sed in behavior (wrt newlines), so if you can use the attached file: log.txt in a Docker container or somewhere where you can just used sed that would be great.

I know it can be difficult to "simplify" regular expression syntax by splitting it out into shorter bits, but I think trying to make this as readable and maintainable as possible is important.

sort < log.txt | \
    sed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | \
    sed 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g'

is not the worst regex I've seen, but also not something I want to parse in my head at 02:00.

kratsg commented 2 years ago

gsed on my mac is POSIX sed (should be identical behavior). That is actually already split up. Can't really go with shorter bits. The first one is a single regex and the second is the clean-up substitutions. There's no way to really make the first one easier to understand.

matthewfeickert commented 2 years ago

gsed on my mac is POSIX sed (should be identical behavior).

Yeah, I remember running into this in the past. The differences are infuriating as even giving --posix to sed won't replicate behavior

sort < log.txt \
    sed --posix ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | \
    sed --posix 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g'

I know that you're using gsed here and not BSD sed, but as the differences are related to newlines, it seems that there are similar problems to things like this and this. Not sure why though.

To give an explicit example of what I mean about newlines:

$ docker run --rm -ti -v $PWD:/tmp:ro ubuntu:20.04
root@c481437777cc:/# sort < /tmp/log.txt | \
>     sed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | \
>     sed 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g' | head -n 20
  - build:

    Add support for Python 3.10 across all backends (#1809)
    Remove wheel and attrs from build-system requires (#1788)
    Require setuptools v42.0.0+ for stability (#1783)
    Set lower bound of scipy v1.1.0 (#1661)
    Set only lower bounds on backend dependencies (#1698)
    Update lower bound on jax to v0.2.10 (#1666)
  - chore:

    Update black to first stable release v22.1.0 (#1754)
    Use constraints.txt for lower bound testing (#1713)
    [pre-commit.ci] pre-commit autoupdate (#1593)
    [pre-commit.ci] pre-commit autoupdate (#1616)
    [pre-commit.ci] pre-commit autoupdate (#1679)
    [pre-commit.ci] pre-commit autoupdate (#1723)
    [pre-commit.ci] pre-commit autoupdate (#1741)
    [pre-commit.ci] pre-commit autoupdate (#1765)
    [pre-commit.ci] pre-commit autoupdate (#1805)
    [pre-commit.ci] pre-commit autoupdate (#1839)
root@c481437777cc:/# 

The POSIX nature of things unfortuantley doesn't come into play too much either, as this is being run on a Ubuntu runner. So we should write for that (though I would also be down to write this all a Bash script that we then just have the runner run making it easier for us to locally test on any maintainer's machine.).

That is actually already split up. Can't really go with shorter bits. The first one is a single regex and the second is the clean-up substitutions. There's no way to really make the first one easier to understand.

Hm. I'll work on writing lots of comments I guess then that break things down bit by bit.

kratsg commented 2 years ago

Maybe this version is a bit cleaner?

$ sort < log.txt | sed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \2 :: \3/;ta;P;D;ba' | sed 's/: \(.\+\)$/:\n      * \1/g; s/ :: /\n      * /g' | sed 's/^\( \+\*\)\([^(]*\)\(.*\)$/\1 \3\2/g'
  - build:
      * (#1809) Add support for Python 3.10 across all backends 
      * (#1788) Remove wheel and attrs from build-system requires 
      * (#1783) Require setuptools v42.0.0+ for stability 
      * (#1661) Set lower bound of scipy v1.1.0 
      * (#1698) Set only lower bounds on backend dependencies 
      * (#1666) Update lower bound on jax to v0.2.10 
  - chore:
      * (#1593) [pre-commit.ci] pre-commit autoupdate 
      * (#1616) [pre-commit.ci] pre-commit autoupdate 
      * (#1679) [pre-commit.ci] pre-commit autoupdate 
      * (#1723) [pre-commit.ci] pre-commit autoupdate 
      * (#1741) [pre-commit.ci] pre-commit autoupdate 
      * (#1765) [pre-commit.ci] pre-commit autoupdate 
matthewfeickert commented 2 years ago

So one small quibble with this is that the sort causes the order within the groups to get changes to alphabetical order of the commit message

Sorted log.txt: ``` - build: * (#1809) Add support for Python 3.10 across all backends * (#1788) Remove wheel and attrs from build-system requires * (#1783) Require setuptools v42.0.0+ for stability * (#1661) Set lower bound of scipy v1.1.0 * (#1698) Set only lower bounds on backend dependencies * (#1666) Update lower bound on jax to v0.2.10 - chore: * (#1593) [pre-commit.ci] pre-commit autoupdate * (#1616) [pre-commit.ci] pre-commit autoupdate * (#1679) [pre-commit.ci] pre-commit autoupdate * (#1723) [pre-commit.ci] pre-commit autoupdate * (#1741) [pre-commit.ci] pre-commit autoupdate * (#1765) [pre-commit.ci] pre-commit autoupdate * (#1805) [pre-commit.ci] pre-commit autoupdate * (#1839) [pre-commit.ci] pre-commit autoupdate * (#1754) Update black to first stable release v22.1.0 * (#1713) Use constraints.txt for lower bound testing - ci: * (#1635) Add absolufy-imports pre-commit hook * (#1846) Add bump version workflow for release tags * (#1836) Add concurrency group to HEAD of dependencies workflow * (#1808) Add CPython 3.10 to testing * (#1760) Add macos-latest to dependency release candidates testing * (#1835) Add matplotlib nightly wheels to HEAD of dependencies testing * (#1596) Add Python 3.9 to 'Current Release' workflow tests * (#1660) Add release candidates to HEAD of dependencies workflow * (#1622) Allow reporting of coverage on PRs from forks * (#1748) Launch tmate session if pytest fails on workflow dispatch run * (#1632) Limit concurrent workflow jobs to one per workflow per branch * (#1727) Publish to TestPyPI on tag or by workflow dispatch trigger * (#1707) Quote GitHub Action python-version number as YAML strings * (#1817) Report coverage for oldest and newest Python tested * (#1628) Report coverage to Codecov without token * (#1589) Skip doctest for 'Minimum supported dependencies' workflow * (#1664) Turn off PyPI release tests on pull requests * (#1623) Update codecov-action to v2 API * (#1743) Update gh-action-pypi-publish to use print_hash * (#1833) Update GitHub Actions to next stable version * (#1828) Use actions/setup-python v3 * (#1598) Use `jupyter-black` pre-commit hook over `nbqa-black` - docs: * (#1645) Add and apply codespell as a pre-commit hook * (#1682) Add ATLAS third-generation scalar leptoquarks search statistical model record * (#1681) Add ATLAS top group probability model records through June 2021 * (#1779) Add citation from 'HL-LHC Computing Review Stage 2' paper * (#1799) Add general citation from MadJAX paper * (#1843) Add general use citation from End-to-End Optimization paper * (#1842) Add general use citation from SimpleAnalysis ATLAS PUB note * (#1844) Add general use citation from Survey of Open Data Concepts paper * (#1733) Add GitHub Release Radar check to release checklist * (#1820) Add JupyterLite REPL for interactive pyhf in docs * (#1739) Add milestone for 1000 project commits to README * (#1789) Add section for tutorial and docs to README * (#1601) Add `uproot4` writing speedup to v0.6.3 release notes * (#1802) Add use citation from ATLAS UEH displaced jets CalRatio paper * (#1800) Add use citation from ATLAS UEH MS displaced jet paper * (#1604) Add use citation from collider signatures of coannihilating dark matter paper * (#1812) Add use citation from neos paper * (#1600) Add use citation from publishing statistical models white paper * (#1744) Add use citation from revisiting mono-tau tails at the LHC paper * (#1614) Add use citation from simplified likelihoods ATLAS PUB note * (#1801) Correct Giordon's affiliation to SCIPP in CITATON.cff * (#1592) Correct v0.6.3 release notes to note `pyhf.pdf._ModelConfig.channels` is a list * (#1703) Ensure docstring examples are contiguous * (#1721) Fix download method of probability models archive in impact plot notebook * (#1724) Fix download method of probability models archive in pull plot notebook * (#1777) Fix link to TRExFitter documentation * (#1803) Fix tiny typo in MC Stat Error documentation * (#1740) Note shapesys and staterror modifier set to 1 for modifier data of 0 * (#1692) only lists are accepted when specifying objects to prune * (#1726) Truncate floating point docstring examples to 8 decimal places * (#1671) Update 2021 published ATLAS probability models * (#1738) Update citation references publication status * (#1813) Update JAX backend normal docstring to jax v0.3.2 returns * (#1798) Update Lukas's affiliation to Technical University of Munich * (#1597) Update maintainer release checklist with v0.6.3 notes * (#1767) Update scipy intersphinx url to drop 'reference' * (#1615) Use sphinxcontrib-bibtex style 'unsrt' to sort citations in reverse chronological order * (#1617) Use sphinx-copybutton prompt regex to fully capture examples - feat: * (#1818) Add contextlib support to pyhf.schema API * (#1613) Add hypotest kwargs to pyhf.infer.intervals.upperlimit * (#817) Add percentile function to tensorlib * (#1638) Add POI-less specification support * (#1625) Add setup for custom modifiers * (#1647) Add support for arrayful JSON * (#1696) Add transpose function to tensorlib * (#1636) Allow POI-less models via Workspace.model * (#1710) Allow skipping validation when constructing workspaces * (#1657) Allow zero rate Poisson * (#1753) Alternative Schema Locations * (#1709) Catch unexpected keyword arguments in workspace construction * (#1646) Configurable default backend * (#1554) Expose fitted parameter values of implicit fits in test statistic calls * (#1708) Raise exception if bin-wise modifier data length doesn't match sample data * (#1670) Remove pyhf.simplemodels.hepdata_like from API * (#1790) Use tbump over bump2version - fix: * (#1704) Accept tar and zip headers in contrib.utils.download requests * (#1746) Accept ValueError for JAX backend `tolist` fallback * (#1834) Add filterwarnings ignore for Pillow DeprecationWarning * (#1827) bump black to 22.3.0 due to click 8.1 release * (#1644) custom modifier / new parameter support and test * (#1824) Disallow Jinja2 v3.1.0 to avoid nbsphinx triggering attribute error * (#1742) Disallow nbsphinx v0.8.8 to avoid empty "raw" directive bug * (#1706) Ensure `_ModelConfig.suggested_fixed` list contains only booleans for all modifiers * (#1642) Fix bug in impact plot visualization * (#1841) Override error on filterwarnings to pass notebook tests * (#1715) Skip doctest of `pyhf.contrib.utils.download` * (#1691) Speed-up readxml by caching key lookup instead of using try/except * (#1588) Update notebooks to use `include_auxdata` kwarg for `pyhf.Workspace.data` * (#1680) Use https protocol as unauthenticated git protocol is no longer supported * (#1610) Use MLEs of NPs to create sampling distributions in ToyCalculator * (#1819) writexml handles missing parameter configs for normfactor - refactor: * (#1806) Clarify exception message applies only to profile likelihood ratio * (#1697) Make contrib.utils.download robust to archive file types * (#1673) Pass `Accept` header to requests in `contrib.utils.download` * (#1639) Simplified parameters * (#1138) Use jax.numpy for JAX backend tensorlib.tolist * (#1694) Use tensorlib.percentile in calculators - revert: * (#1837) Remove Jinja2 restrictions given nbconvert v6.4.5 - test: * (#1771) Add html coverage reports from pytest * (#1821) Assert exported StatError has no name attribute * (#1761) Avoid tensorflow macOS floating point deviation with pytest.approx * (#1773) Consolidate and update pytest options in pyproject.toml * (#1730) Make fail_backend markers add pytest.mark.xfail and remove fail_jax marker on percentile tests * (#1467) Remove 'src' from pytest test testpaths to allow for non-editable install in CI * (#1651) Use netlocs that are known to not exist or give known return * (#1711) Use scikit-hep-testdata to provide probability models for regression tests * (#1702) Use xfail for tests that fail for upstream problems ```

There is some benefit to reading through a list of fixes chronologically as a developer, so it might be nice to see if we could preserve that with some additional tweaking. :thinking:

kratsg commented 2 years ago

Just use sort -n I think.

Ahh no, you need to do pre-editing to move the numerical entries to the front before sorting... this is keeping the chronological sort order.

kratsg commented 2 years ago

@matthewfeickert up to you what you want to do here