Open matthewfeickert opened 2 years ago
@kratsg can you also lengthen this into something that is more understandable and maintainable? There are additionally differences between gsed
and sed
in behavior (wrt newlines), so if you can use the attached file: log.txt in a Docker container or somewhere where you can just used sed
that would be great.
I know it can be difficult to "simplify" regular expression syntax by splitting it out into shorter bits, but I think trying to make this as readable and maintainable as possible is important.
sort < log.txt | \
sed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | \
sed 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g'
is not the worst regex I've seen, but also not something I want to parse in my head at 02:00.
gsed
on my mac is POSIX sed
(should be identical behavior). That is actually already split up. Can't really go with shorter bits. The first one is a single regex and the second is the clean-up substitutions. There's no way to really make the first one easier to understand.
gsed
on my mac is POSIXsed
(should be identical behavior).
Yeah, I remember running into this in the past. The differences are infuriating as even giving --posix
to sed
won't replicate behavior
sort < log.txt \
sed --posix ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | \
sed --posix 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g'
I know that you're using gsed
here and not BSD sed
, but as the differences are related to newlines, it seems that there are similar problems to things like this and this. Not sure why though.
To give an explicit example of what I mean about newlines:
$ docker run --rm -ti -v $PWD:/tmp:ro ubuntu:20.04
root@c481437777cc:/# sort < /tmp/log.txt | \
> sed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \o11\2\o11\o11\3/;ta;P;D;ba' | \
> sed 's/\o11\+/\n\o11/g;s/: \(.\+\)$/:\n\o11\1/g' | head -n 20
- build:
Add support for Python 3.10 across all backends (#1809)
Remove wheel and attrs from build-system requires (#1788)
Require setuptools v42.0.0+ for stability (#1783)
Set lower bound of scipy v1.1.0 (#1661)
Set only lower bounds on backend dependencies (#1698)
Update lower bound on jax to v0.2.10 (#1666)
- chore:
Update black to first stable release v22.1.0 (#1754)
Use constraints.txt for lower bound testing (#1713)
[pre-commit.ci] pre-commit autoupdate (#1593)
[pre-commit.ci] pre-commit autoupdate (#1616)
[pre-commit.ci] pre-commit autoupdate (#1679)
[pre-commit.ci] pre-commit autoupdate (#1723)
[pre-commit.ci] pre-commit autoupdate (#1741)
[pre-commit.ci] pre-commit autoupdate (#1765)
[pre-commit.ci] pre-commit autoupdate (#1805)
[pre-commit.ci] pre-commit autoupdate (#1839)
root@c481437777cc:/#
The POSIX nature of things unfortuantley doesn't come into play too much either, as this is being run on a Ubuntu runner. So we should write for that (though I would also be down to write this all a Bash script that we then just have the runner run making it easier for us to locally test on any maintainer's machine.).
That is actually already split up. Can't really go with shorter bits. The first one is a single regex and the second is the clean-up substitutions. There's no way to really make the first one easier to understand.
Hm. I'll work on writing lots of comments I guess then that break things down bit by bit.
Maybe this version is a bit cleaner?
$ sort < log.txt | sed ':a;N;s/^\([^:]\+\): \(.*\)\n\1: \(.*\)$/\1: \2 :: \3/;ta;P;D;ba' | sed 's/: \(.\+\)$/:\n * \1/g; s/ :: /\n * /g' | sed 's/^\( \+\*\)\([^(]*\)\(.*\)$/\1 \3\2/g'
- build:
* (#1809) Add support for Python 3.10 across all backends
* (#1788) Remove wheel and attrs from build-system requires
* (#1783) Require setuptools v42.0.0+ for stability
* (#1661) Set lower bound of scipy v1.1.0
* (#1698) Set only lower bounds on backend dependencies
* (#1666) Update lower bound on jax to v0.2.10
- chore:
* (#1593) [pre-commit.ci] pre-commit autoupdate
* (#1616) [pre-commit.ci] pre-commit autoupdate
* (#1679) [pre-commit.ci] pre-commit autoupdate
* (#1723) [pre-commit.ci] pre-commit autoupdate
* (#1741) [pre-commit.ci] pre-commit autoupdate
* (#1765) [pre-commit.ci] pre-commit autoupdate
So one small quibble with this is that the sort
causes the order within the groups to get changes to alphabetical order of the commit message
There is some benefit to reading through a list of fixes chronologically as a developer, so it might be nice to see if we could preserve that with some additional tweaking. :thinking:
Just use sort -n
I think.
Ahh no, you need to do pre-editing to move the numerical entries to the front before sorting... this is keeping the chronological sort order.
@matthewfeickert up to you what you want to do here
Following up from PR #1846, @kratsg has suggested that we sort the contributions that show up the annotated tags for releases. This seems like a pretty good idea to get a clear, high level view pretty fast.
He has the following snippet
— that should be applied to the output that you get following
https://github.com/scikit-hep/pyhf/blob/6452cc62985dfb0d18e997e05211cfcba6c9d210/.github/workflows/bump-version.yml#L219-L222
— which can take input like
Full input:
``` - docs: Add general use citation from Survey of Open Data Concepts paper (#1844) - docs: Add general use citation from End-to-End Optimization paper (#1843) - docs: Add general use citation from SimpleAnalysis ATLAS PUB note (#1842) - fix: Override error on filterwarnings to pass notebook tests (#1841) - chore: [pre-commit.ci] pre-commit autoupdate (#1839) - revert: Remove Jinja2 restrictions given nbconvert v6.4.5 (#1837) - ci: Add concurrency group to HEAD of dependencies workflow (#1836) - ci: Add matplotlib nightly wheels to HEAD of dependencies testing (#1835) - ci: Update GitHub Actions to next stable version (#1833) - fix: Add filterwarnings ignore for Pillow DeprecationWarning (#1834) - ci: Use actions/setup-python v3 (#1828) - fix: bump black to 22.3.0 due to click 8.1 release (#1827) - docs: Add JupyterLite REPL for interactive pyhf in docs (#1820) - fix: Disallow Jinja2 v3.1.0 to avoid nbsphinx triggering attribute error (#1824) - test: Assert exported StatError has no name attribute (#1821) - feat: Add contextlib support to pyhf.schema API (#1818) - fix: writexml handles missing parameter configs for normfactor (#1819) - ci: Report coverage for oldest and newest Python tested (#1817) - feat: Alternative Schema Locations (#1753) - docs: Update JAX backend normal docstring to jax v0.3.2 returns (#1813) - docs: Add use citation from neos paper (#1812) - build: Add support for Python 3.10 across all backends (#1809) - ci: Add CPython 3.10 to testing (#1808) - refactor: Clarify exception message applies only to profile likelihood ratio (#1806) - chore: [pre-commit.ci] pre-commit autoupdate (#1805) - docs: Fix tiny typo in MC Stat Error documentation (#1803) - docs: Add use citation from ATLAS UEH displaced jets CalRatio paper (#1802) - docs: Correct Giordon's affiliation to SCIPP in CITATON.cff (#1801) - docs: Add use citation from ATLAS UEH MS displaced jet paper (#1800) - docs: Update Lukas's affiliation to Technical University of Munich (#1798) - docs: Add general citation from MadJAX paper (#1799) - feat: Use tbump over bump2version (#1790) - docs: Add section for tutorial and docs to README (#1789) - build: Remove wheel and attrs from build-system requires (#1788) - build: Require setuptools v42.0.0+ for stability (#1783) - docs: Add citation from 'HL-LHC Computing Review Stage 2' paper (#1779) - docs: Fix link to TRExFitter documentation (#1777) - test: Consolidate and update pytest options in pyproject.toml (#1773) - test: Add html coverage reports from pytest (#1771) - chore: [pre-commit.ci] pre-commit autoupdate (#1765) - docs: Update scipy intersphinx url to drop 'reference' (#1767) - ci: Add macos-latest to dependency release candidates testing (#1760) - test: Avoid tensorflow macOS floating point deviation with pytest.approx (#1761) - chore: Update black to first stable release v22.1.0 (#1754) - ci: Launch tmate session if pytest fails on workflow dispatch run (#1748) - fix: Accept ValueError for JAX backend `tolist` fallback (#1746) - docs: Add use citation from revisiting mono-tau tails at the LHC paper (#1744) - ci: Update gh-action-pypi-publish to use print_hash (#1743) - ci: Limit concurrent workflow jobs to one per workflow per branch (#1632) - docs: Note shapesys and staterror modifier set to 1 for modifier data of 0 (#1740) - chore: [pre-commit.ci] pre-commit autoupdate (#1741) - fix: Disallow nbsphinx v0.8.8 to avoid empty "raw" directive bug (#1742) - docs: Add milestone for 1000 project commits to README (#1739) - docs: Update citation references publication status (#1738) - docs: Add GitHub Release Radar check to release checklist (#1733) - docs: Truncate floating point docstring examples to 8 decimal places (#1726) - ci: Publish to TestPyPI on tag or by workflow dispatch trigger (#1727) - test: Make fail_backend markers add pytest.mark.xfail and remove fail_jax marker on percentile tests (#1730) - feat: Allow skipping validation when constructing workspaces (#1710) - docs: Fix download method of probability models archive in pull plot notebook (#1724) - chore: [pre-commit.ci] pre-commit autoupdate (#1723) - docs: Fix download method of probability models archive in impact plot notebook (#1721) - chore: Use constraints.txt for lower bound testing (#1713) - fix: Skip doctest of `pyhf.contrib.utils.download` (#1715) - test: Use scikit-hep-testdata to provide probability models for regression tests (#1711) - feat: Catch unexpected keyword arguments in workspace construction (#1709) - feat: Raise exception if bin-wise modifier data length doesn't match sample data (#1708) - ci: Quote GitHub Action python-version number as YAML strings (#1707) - fix: Ensure `_ModelConfig.suggested_fixed` list contains only booleans for all modifiers (#1706) - fix: Accept tar and zip headers in contrib.utils.download requests (#1704) - refactor: Make contrib.utils.download robust to archive file types (#1697) - docs: Ensure docstring examples are contiguous (#1703) - build: Set only lower bounds on backend dependencies (#1698) - test: Use xfail for tests that fail for upstream problems (#1702) - refactor: Use tensorlib.percentile in calculators (#1694) - refactor: Use jax.numpy for JAX backend tensorlib.tolist (#1138) - feat: Add transpose function to tensorlib (#1696) - docs: only lists are accepted when specifying objects to prune (#1692) - feat: Add percentile function to tensorlib (#817) - fix: Speed-up readxml by caching key lookup instead of using try/except (#1691) - docs: Add ATLAS third-generation scalar leptoquarks search statistical model record (#1682) - docs: Add ATLAS top group probability model records through June 2021 (#1681) - fix: Use https protocol as unauthenticated git protocol is no longer supported (#1680) - refactor: Pass `Accept` header to requests in `contrib.utils.download` (#1673) - chore: [pre-commit.ci] pre-commit autoupdate (#1679) - docs: Update 2021 published ATLAS probability models (#1671) - feat: Remove pyhf.simplemodels.hepdata_like from API (#1670) - build: Update lower bound on jax to v0.2.10 (#1666) - build: Set lower bound of scipy v1.1.0 (#1661) - ci: Turn off PyPI release tests on pull requests (#1664) - refactor: Simplified parameters (#1639) - feat: Configurable default backend (#1646) - ci: Add release candidates to HEAD of dependencies workflow (#1660) - feat: Allow zero rate Poisson (#1657) - test: Remove 'src' from pytest test testpaths to allow for non-editable install in CI (#1467) - feat: Add support for arrayful JSON (#1647) - test: Use netlocs that are known to not exist or give known return (#1651) - fix: custom modifier / new parameter support and test (#1644) - docs: Add and apply codespell as a pre-commit hook (#1645) - feat: Add POI-less specification support (#1638) - fix: Fix bug in impact plot visualization (#1642) - feat: Allow POI-less models via Workspace.model (#1636) - feat: Add setup for custom modifiers (#1625) - ci: Add absolufy-imports pre-commit hook (#1635) - feat: Expose fitted parameter values of implicit fits in test statistic calls (#1554) - feat: Add hypotest kwargs to pyhf.infer.intervals.upperlimit (#1613) - ci: Report coverage to Codecov without token (#1628) - ci: Update codecov-action to v2 API (#1623) - ci: Allow reporting of coverage on PRs from forks (#1622) - docs: Use sphinx-copybutton prompt regex to fully capture examples (#1617) - chore: [pre-commit.ci] pre-commit autoupdate (#1616) - docs: Use sphinxcontrib-bibtex style 'unsrt' to sort citations in reverse chronological order (#1615) - docs: Add use citation from simplified likelihoods ATLAS PUB note (#1614) - fix: Use MLEs of NPs to create sampling distributions in ToyCalculator (#1610) - docs: Add use citation from collider signatures of coannihilating dark matter paper (#1604) - docs: Add `uproot4` writing speedup to v0.6.3 release notes (#1601) - docs: Add use citation from publishing statistical models white paper (#1600) - ci: Use `jupyter-black` pre-commit hook over `nbqa-black` (#1598) - docs: Update maintainer release checklist with v0.6.3 notes (#1597) - ci: Add Python 3.9 to 'Current Release' workflow tests (#1596) - docs: Correct v0.6.3 release notes to note `pyhf.pdf._ModelConfig.channels` is a list (#1592) - chore: [pre-commit.ci] pre-commit autoupdate (#1593) - ci: Skip doctest for 'Minimum supported dependencies' workflow (#1589) - fix: Update notebooks to use `include_auxdata` kwarg for `pyhf.Workspace.data` (#1588) ```and output
It is probably worth splitting this out across multiple lines in
.github/workflows/bump-version.yml
to make it more understandable and easier to maintain, but having it as a 1 liner is some good code golf! :)His summary is: