python / core-workflow

Issue tracker for CPython's workflow
https://mail.python.org/mailman/listinfo/core-workflow
Apache License 2.0
93 stars 60 forks source link

Generate Misc/NEWS from individual files #6

Closed brettcannon closed 7 years ago

brettcannon commented 7 years ago

As pulled from PEP 512:

Traditionally the Misc/NEWS file [19] has been problematic for changes which spanned Python releases. Oftentimes there will be merge conflicts when committing a change between e.g., 3.5 and 3.6 only in the Misc/NEWS file. It's so common, in fact, that the example instructions in the devguide explicitly mention how to resolve conflicts in the Misc/NEWS file [21] . As part of our tool modernization, working with the Misc/NEWS file will be simplified.

The planned approach is to use an individual file per news entry, containing the text for the entry. In this scenario each feature release would have its own directory for news entries and a separate file would be created in that directory that was either named after the issue it closed or a timestamp value (which prevents collisions). Merges across branches would have no issue as the news entry file would still be uniquely named and in the directory of the latest version that contained the fix. A script would collect all news entry files no matter what directory they reside in and create an appropriate news file (the release directory can be ignored as the mere fact that the file exists is enough to represent that the entry belongs to the release). Classification can either be done by keyword in the new entry file itself or by using subdirectories representing each news entry classification in each release directory (or classification of news entries could be dropped since critical information is captured by the "What's New" documents which are organized). The benefit of this approach is that it keeps the changes with the code that was actually changed. It also ties the message to being part of the commit which introduced the change. For a commit made through the CLI, a script could be provided to help generate the file. In a bot-driven scenario, the merge bot could have a way to specify a specific news entry and create the file as part of its flattened commit (while most likely also supporting using the first line of the commit message if no specific news entry was specified). If a web-based workflow is used then a status check could be used to verify that a new entry file is in the pull request to act as a reminder that the file is missing. Code for this approach has been written previously for the Mercurial workflow at http://bugs.python.org/issue18967 . There is also tools from the community like https://pypi.python.org/pypi/towncrier , https://github.com/twisted/newsbuilder , and http://docs.openstack.org/developer/reno/ .

Discussions at the Sep 2016 Python core-dev sprints led to this decision compared to the rejected approaches outlined in the Rejected Ideas section of this PEP. The separate files approach seems to have the right balance of flexibility and potential tooling out of the various options while solving the motivating problem.

brettcannon commented 7 years ago

Rejected Ideas

Deriving Misc/NEWS from the commit logs

As part of the discussion surrounding Handling Misc/NEWS , the suggestion has come up of deriving the file from the commit logs itself. In this scenario, the first line of a commit message would be taken to represent the news entry for the change. Some heuristic to tie in whether a change warranted a news entry would be used, e.g., whether an issue number is listed.

This idea has been rejected due to some core developers preferring to write a news entry separate from the commit message. The argument is the first line of a commit message compared to that of a news entry have different requirements in terms of brevity, what should be said, etc.

Deriving Misc/NEWS from bugs.python.org

A rejected solution to the NEWS file problem was to specify the entry on bugs.python.org [5] . This would mean an issue that is marked as "resolved" could not be closed until a news entry is added in the "news" field in the issue tracker. The benefit of tying the news entry to the issue is it makes sure that all changes worthy of a news entry have an accompanying issue. It also makes classifying a news entry automatic thanks to the Component field of the issue. The Versions field of the issue also ties the news entry to which Python releases were affected. A script would be written to query bugs.python.org for relevant new entries for a release and to produce the output needed to be checked into the code repository. This approach is agnostic to whether a commit was done by CLI or bot. A drawback is that there's a disconnect between the actual commit that made the change and the news entry by having them live in separate places (in this case, GitHub and bugs.python.org). This would mean making a commit would then require remembering to go back to bugs.python.org to add the news entry.

brettcannon commented 7 years ago

The only other potential solution other than individual files is a bot or script which collects news entry from messages in PR comments. The pro/con list is:

Individual files

Pro

Con

Bot

Pros

Cons

dhellmann commented 7 years ago

Managing individual files by hand also has the benefit of having the ability to communicate separately to reviewers via commit messages and consumers via the release notes. We've had good luck with that in OpenStack where the consumer of the software is often interested in very different information from the other contributors who may be reviewing patches.

dhellmann commented 7 years ago

Hmm, it looks like the sample repo in python/cpython doesn't have any tags indicating when specific versions were released. Reno currently relies on tags for version identification. I could update it to look at something else -- how does one determine the current point release on a given branch? Is it in a file somewhere?

orsenthil commented 7 years ago

@dhellmann , yeah, that's a problem with the current python/cpython repo. Tags were not replicated and pushed. We will have to repush the repo with tags (perhaps using tool and/or instructions from https://github.com/orsenthil/cpython-hg-to-git) which has been tried.

dhellmann commented 7 years ago

@orsenthil oh, good, if the plan is to eventually have the tags in place then that's no problem. I assumed there was some other mechanism in place already.

dstufft commented 7 years ago

This tool should probably make this pretty easy https://pypi.org/project/towncrier/

vstinner commented 7 years ago

I started a thread on python-dev because Misc/NEWS became a blocker point with the new GitHub workflow: https://mail.python.org/pipermail/python-dev/2017-February/147417.html


FYI I wrote a tool which computes the released Python versions including a change from a list of commits to generate a report on Python vulnerabilities: https://github.com/haypo/python-security/blob/master/render_doc.py

The core of the feature is "git tag --contains sha1". Then I implemented a logic to select which versions should be displayed. Output: http://python-security.readthedocs.io/vulnerabilities.html

The tool also computes the number of days between the vulnerability disclosure date, commit date and release date. I chose to ignore beta and release candidate versions.

But I guess that there are other existing projects which would fit Misc/NEWS requirements better than my tool! (reno, twoncrier, something else?)

westurner commented 7 years ago

changelog filenames

Escaping Markup

NOTE: commit logs may contain (executable) markup

Tools

Projects:

westurner commented 7 years ago

Additional requirements/requests for Misc/NEWS'ification

brettcannon commented 7 years ago

Since there seem to be multiple options for pre-existing tools and writing our own, we should start by identifying the requirements of what we want to be supported in the final NEWS file:

Features we want

Nice-to-haves would be:

Am I missing anything we need/want the solution to cover?

What if we were writing a tool from scratch?

Now what would a greenfield solution look like (to help set a baseline of what we might want a tool to do)? To me, we would have a directory to contain all news-related details. In that top-level directory would be subdirectories for each subsection of the release (e.g. "Core and Built-ins", "Library", etc.). Each NEWS entry file would then go into the appropriate subdirectory. The filename would be the issue related to the change, e.g. bpo-12345.rst (in all honesty we should have an issue per change in the NEWS file as a requirement since if it's important enough to be listed in the NEWS files then we need a way to track any past and future discussions related to the change, and yes, we can come up with a standard for listing multiple issue numbers). If we wanted to support listing affected module(s) we could have some convention in the files to list that sort of detail (e.g. "modules: importlib").

Then upon release the RM would run a script which would read all the files, generate the appropriate text (which includes line-wrapping, etc.), and write out the file. Now we can either keep a single file with all entries (which gets expensive to load and view online, which also means we should add an appropriate .rst file extension), or a file per feature release (which makes searching the complete history a little harder as you then have to use something like grep to search multiple files quickly). I'm assuming we won't feel the need to drag NEWS entries forward -- e.g. entries made for 3.6.1 won't be pulled into the master branch for 3.7 -- since most changes in bugfix branches will also be fixed in the master branch and thus be redundant. But if people really care then a single PR to copy the update NEWS file(s) to master upon release could be done.

Next steps

First step is to figure out what our requirements are of any solution. So if I'm missing anything please reply here (I think I covered what @warsaw mentioned on python-dev above). Once we have the requirements known we can look at the available tools to see how close they come to meeting our needs and decide if we like any of them or want to write our own.

warsaw commented 7 years ago

Thanks @brettcannon

I was thinking something similar, but slightly different to your implementation outline. I like naming files with their bug number, e.g. bpo-12345.rst but I was thinking that the top-level organizational directory would contain subdirectories per release, e.g. 3.7, 3.6, 3.5, etc. Symlinks or hardlinks would take care of issues that span multiple versions. If that doesn't work because of platform limitations, then a simple cross reference inside similarly named files in different directories would work just as well.

I was thinking maybe we don't need subcategory subdirectories within there. Everything else can be specified using metadata in the .rst file. Kind of like the way Pelican supports metadata directives for the date, category, tags, and slugs. E.g.

News/3.7/bpo-28598.rst

:section: core
:alsofixes: bpo-12345
:versionadded: 3.7.0a1
:backports: 3.6.1
:authors: Martijn Pieters

Support __rmod__ for subclasses of str being called before str.__mod__.

Generally, you wouldn't have an :alsofixes: directive, and you wouldn't have a :backports: directive for new features. Oh, and +1 for strongly recommending (and maybe requiring) anything in NEWS to have a bug. We could even add a CI gate so that any merge must have an entry in News/X.Y/

I don't know whether those directives are a reST feature or something added by Pelican. We could also use the RFC-822 style headers found in PEPs.

warsaw commented 7 years ago

More about using major release numbers as the organizational structure:

dstufft commented 7 years ago

Do we want to keep the news fragments around forever? I assumed that once they've been integrated into a actual news file that they would be deleted.

(+1 is supporting deletion, -1 is to keep files around forever.)

warsaw commented 7 years ago

I was thinking we'd keep them around forever. They probably don't take much space, and they compress well, so it would be very handy to be able to do the historical greps from the master branch.

brettcannon commented 7 years ago

The problem I see for keeping them around is simply the amount of files the repo will grow by which might affect checkout speed and maybe upset some file systems. Do we have any idea how many files 3.4 would have if we kept the files around for the life of the feature release? And if we generate a single file per feature release then you can grep that one file plus entry files to get your version search without keeping duplicate data around.

As for the feature version subdirectories, the reason I didn't suggest that is it seemed unnecessary in a cherry picking workflow. If a file is in the e.g. 3.6 branch then you know it's for 3.6 thanks to its mere existence. And so even if we did use feature branch directories it won't matter if a fix is in the 3.7 directory in the 3.6 branch as its existence means it applies to Python 3.6 itself, it just happens to have been committed to 3.7 first. But as I said, if the existence of the file means the change applies then sticking it in a feature branch directory only comes up if you keep the individual files around.

Lastly, adding cross-references between feature branch subdirectories complicates cherry picks as it adds a mandatory extra step. By leaving out cross-references it allows cherry picks that merge cleanly to not require any more work beyond opening the PR while cross-referencing adds a required step for every cherry pick PR.

terryjreedy commented 7 years ago

+1 to Barry's variation. I like self-identifying files. For one thing, if a contributor submits a file with the wrong section, I presume it would be easier to edit the section line than to move the file.

What I would really like is auto-generation of backport pull-requests. If this should not always be done, then use a symbol like '@' to trigger the auto-generation.

On dstufft's comment, I am not sure if thumbs-up is for 'keep forever' or 'delete when integrated'. 1000s of files under 500 bytes on file systems using 4000 bytes'file (Windows, last I knew) is a bit wasteful. I am for deleting individual files after they are sorted and concatenated into a 'raw' (unformatted) listing. This would accomplish compression without loss of information. (One would still grep the metadata.) It would allow more than one formatting.

News entries can have non-ascii chars (for names) but must be utf-8 encoded. I suspect that we will occasionally get mis-encoded submissions. Will there be an auto encoding check somewhere?

brettcannon commented 7 years ago

Moving a file isn't difficult; git will just delete the old one and add it again under the new name (and implicitly pick up the file was moved).

As for auto-generating cherry picks, see GH-8.

For voting on Donald's comment, I view voting +1 is for deletion. I have edited the comment to make it more clear (I also just looked at how @warsaw voted and voted the opposite 😉 ).

And for the encoding check, I'm sure we will implement a status check that will do things like verify formatting, check the text encoding is UTF-8, etc.

dhellmann commented 7 years ago

@brettcannon At least in the case of reno, the directory structure isn't needed for organizing the files because it pulls the content out of the git history, not from the currently checked out workspace.

If we're worried about the number of note files, then a bunch of individual notes could be collapsed into one big file just before each release. Backports of fixes could still use separate files to allow for clean cherry-picks.

westurner commented 7 years ago

In order to JOIN this metadata {description, issue numbers) with other metadata; something like YAML (YAML-LD?) with some type of URI would be most helpful.

JSON-LD:


{"@id": "http://.../changelog#entry/uri/<checksum?>",
 "description": "...",
 "issues": [
   {"n": n,
     "description":""}]
}
brettcannon commented 7 years ago

One other thing I forgot to mention about why I suggested subdirectories for classification is it does away with having to look up what potential classification options there are. I know when I happen to need to add an entry early on in a branch I always have to scroll back to see what sections we have used previously to figure out which ones fits best. If we had directories we never deleted then that guesswork is gone.

@dhellmann so are you saying reno looks at what files changed in the relevant commit to infer what to classify the change as (e.g. if only stuff in Lib/ changed then it would automatically be classified as "Library")?

larryhastings commented 7 years ago

The problem I see for keeping them around is simply the amount of files the repo will grow by which might affect checkout speed and maybe upset some file systems.

Easily solved. Consider that git itself is storing kajillions of files in its object store. It manages this by employing a fascinating feature--available in all modern operating systems!--called a "subdirectory".

In the case of git's object store, it snips off the first two characters of the object's hexified hash, and that becomes the subdirectory name. So there are potentially 256 subdirectories, and 1/256 on average of the files go in each subdir.

In our case, I'd suggest that

dstufft commented 7 years ago

I'll say that I don't see a whole lot of value to keeping the files around once they've been "compiled" into the relevant NEWS file.

larryhastings commented 7 years ago

I'll say that I don't see a whole lot of value to keeping the files around once they've been "compiled" into the relevant NEWS file.

In that case, why have the discrete files in the first place! Just have people add their entries directly to the NEWS file.

... y'see, that's the problem we're trying to solve. When I cut a Python release, I always have a big painful merge I have to do by hand on the NEWS file. If we keep the discrete files, I can just regenerate NEWS and know it's going to be correct.

dstufft commented 7 years ago

In that case, why have the discrete files in the first place! Just have people add their entries directly to the NEWS file.

Because many people adding and changing lines to the NEWS files causes merge conflicts. One person periodically "compiling" that NEWS file during a release does not.

larryhastings commented 7 years ago

I don't want to go down that road.

I propose that we design the tool to use the discrete files, and Misc/NEWS is only ever generated from that tool. If, down the road, we decide that it's really okay, we can anoint Misc/NEWS as a second canonical location and delete the discrete files.

What problem does keeping the discrete files cause? Is it just that you don't like it for some aesthetic reason?

dstufft commented 7 years ago

@larryhastings Err, no the canonical place is not either. Entries for the next release of Python get a discrete file, period. The tool consumes the previous NEWS file, reads the new news fragments and outputs a new NEWS file combining the two by pre-pending onto the previous NEWS file. If we can't trust core committers not to touch the previously generated NEWS files, we can make a status check for it.

Having discrete files that stick around forever makes working with the canonical files harder to do. Either you don't namespace them by version and you end up with a huge list of files (inside or outside of sub directories, it doesn't matter) which makes actually mucking with them harder. Like for example, if I'm trying to look through what's changed thus far in 3.6.1rc1 even though it hasn't been released.

In addition to that, tracking what version something got released with becomes much harder. In the "always keep a discrete file" method you have to either add it in as part of the contents of the file itself or you need to namespace it by a directory per file. In either case it makes backporting changes much more error prone and adds busy work because beyond just cherry-picking now I need to ensure that I properly update or move the file for each branch I'm backporting it to. By keeping that information out of band, it means that singular file can easily move between branches transparently and will automatically DTRT during backports.

The above problem gets a lot worse when someone might create a PR and have it going through the review process but not get merged, then a release is cut. Now suddenly every PR to that branches release notes are invalidated and wrong and every single PR to say, master needs to have someone go through and fix it so that the existing PRs all have the new, correct version.

I don't see any problems caused by "compiling" the the Misc/NEWS file each release and deleting the old files, except that maybe a core committer will outright refuse to follow the new defined procedures of the project and will go trying to manually create Misc/NEWS entries... in which case, seems like they shouldn't be a committer at all.

ned-deily commented 7 years ago

I agree with @dstufft here, IIUC. If we had the "combining" tool, we could add a step to the release process so that the release manager would be responsible for doing a final combine and delete as part of tagging a release. I don't see any value in keeping the individual files around and I do see the necessity and value of continuing to have a single combined Misc/NEWS file as part of a release.

warsaw commented 7 years ago

I see this as a source-file problem. The individual bpo-12345.rst files are the "source" for Misc/NEWS. If you compile and throw away the source, then you are losing information, which you then have to reproduce as best you can by "decompiling" Misc/NEWS. And you may not even be able to do that if the bpo-12345.rst files contain some richer metadata, as I suggest (since it's highly unlikely you're going to retain that metadata in an easily parsable format in Misc/NEWS).

I'd like for the compilation process to be reproducible and idempotent.

If you delete after compilation, then when does this happen? At release time? Occasionally during the release process? Either way, you're going to have a massive commit that deletes a ton of those little files and adds/updates NEWS. What if someone does that accidentally, or there's a bug in the script? Once the source files are gone, you will have to play games with git to get back to a clean pre-compilation state. That's why reproducible and idempotent are useful.

I'm not at all concerned about tons of little files as @larryhastings says. I can't imagine it will make any dent in cloning speeds. Of course, you don't know until you measure it, but I imagine there's a ton of good compression all up and down the stack. Git is supposed to be good at managing text files, so let it!

I'm not worried about misclassifications. That can be enforced in the script, and we can have a CI step that dry-run builds NEWS to prove it's correct. That plus documentation should take care of things.

And note that I am suggestion a new News/ top level directory that contains all these little .rst files, with NEWS.rst at the top of that (yes, moved from Misc/).

larryhastings commented 7 years ago

I don't see any problems caused by "compiling" the the Misc/NEWS file each release and deleting the old files, except [problem it causes]

I see more problems, see below.

I wouldn't claim that any workflow here doesn't cause problems. I think anything we do will have some edge case. But I'm dead certain that only having one canonical storage location for Misc/NEWS entries--rather than two, and which one is being used is context-sensitive--is simpler and will lead to fewer problems.

If we can't trust core committers not to touch the previously generated NEWS files, we can make a status check for it. [...] except that maybe a core committer will outright refuse to follow the new defined procedures of the project and will go trying to manually create Misc/NEWS entries... in which case, seems like they shouldn't be a committer at all.

I guarantee that somebody you don't want to argue with (e.g. Guido) will be the special snowflake editing NEWS by hand. I think it's better to obviate the argument entirely.

Having discrete files that stick around forever makes working with the canonical files harder to do.

What do you mean by "canonical files" here?

If you mean Misc/NEWS itself, then may I remind you that in my proposal the "discrete files" are the canonical source for that information, and Misc/NEWS is merely an output file designed for convenient human consumption. Nobody will be "working with" Misc/NEWS per se; we release managers would be the only people changing it, and in the case of merge conflicts we'd feel totally emboldened to blow it away, recreate it from scratch, and tell Git "ignore merge conflicts, check in what I've got". (In fact I propose that that be the exact workflow every time.) So this simply isn't an issue.

We could even go so far as to not check in Misc/NEWS. Building it would be part of the release workflow, and if anybody wanted to read it they could rebuild it themselves. This doesn't seem like much of a burden, for a file people rarely examine during development.

tracking what version something got released with becomes much harder. In the "always keep a discrete file" method you have to either add it in as part of the contents of the file itself or you need to namespace it by a directory per file.

Can we ask Git itself? Would "git log" on the discrete file tell us the branch it was checked in under? (I just tried, and it seemed like the answer was "no", but I'm far from a Git expert so I wouldn't take my findings as authoritative.)

In any case, I don't understand your position here. You state this as if this is a critique of keeping the discrete files forever. But if this is really a problem, then we'll have this problem if we use the discrete files at all. Anybody cherry-picking / forward-porting / back-porting will have to deal with this.

Worse yet, it means that there are two different workflows, depending on whether the cherry-pick is done before or after a release is cut. "Oh, you realized you need to backport ? Well, I just released 3.8b1, so your discrete NEWS entry file got deleted, you'll have to ". I suspect this second worfklow would include re-creating the discrete file in the backported branch, either by going back to before it was deleted and rescuing it, or by creating a new file and copying and pasting in from the generated NEWS file. I don't see that as a workflow improvement over "when you backport, make sure you move the Misc/NEWS entry too" in all cases. Again, TOOWTDI.

Right now we have a miserable workflow wrt Misc/NEWS entries, particularly when back-porting / forward-porting / cherry-picking. I think the discrete files approach will make this less-bad. Having the canonical location be in one of two places muddies the waters, making the workflow more complicated and context-sensitive.

On the other hand, keeping the discrete files forever should make it easier to correlate a Misc/NEWS entry back to its checkins. Right now you have to run "blame" on Misc/NEWS and glean the revision from that. And if it's been through a couple of forward-merges and edits you may need to run several versioned "blames" to figure it out. With discrete files, you find the particular file with a (recursive?) grep, then examine the file's revision log. And again, if sometimes it's stored in a discrete file, and sometimes it now lives in Misc/NEWS, that makes the workflow more complicated.

FWIW, in my "MergeNEWS" prototype, the generated filename was

Misc/NEWS.d/<version>/<section>.<timestamp>.<md5hash>.txt

so clearly I thought it best to store the version as part of the path. This worked well (in theory) with forward-merging; if you fixed a bug in 3.5, then forward-merged to 3.6, the Misc/NEWS.d entry would show up in the 3.6 Misc/NEWS file, albeit under 3.5 (which is kind of accurate anyway). This is less pleasant when back-porting, which I gather is the preferred approach with the new Git-based workflow, but this is a design choice we can easily revisit.

Ned sez:

If we had the "combining" tool, we could add a step to the release process so that the release manager would be responsible for doing a final combine and delete as part of tagging a release.

My prototype generates Misc/NEWS from scratch every time. FWIW, making it parse Misc/NEWS so it can find the proper insertion points for all the new items would make it more complicated. Not an unmitigated disaster of complexity, but I prefer simple where I can get it.

I do see the necessity and value of continuing to have a single combined Misc/NEWS file as part of a release.

Nobody is suggesting otherwise. Running the build process to produce Misc/NEWS would be part of the release process. Whether the NEWS entries are stored in the discrete files or partially in discrete files and partially in the NEWS file doesn't affect that.

dstufft commented 7 years ago

If you delete after compilation, then when does this happen? At release time? Occasionally during the release process?

At release time.

Either way, you're going to have a massive commit that deletes a ton of those little files and adds/updates NEWS. What if someone does that accidentally, or there's a bug in the script? Once the source files are gone, you will have to play games with git to get back to a clean pre-compilation state. That's why reproducible and idempotent are useful.

Er, you mean if someone accidentally does it, commits it, pushes it to a branch, makes a PR for it, waits for the status checks to pass, and then merges it? Since we can't push directly to branches this seems extremely far fetched. But for the sake of argument let's say someone manages to do all of that accidentally. The solution is pretty easy, git revert which I really don't consider "playing games with git".

I'm not at all concerned about tons of little files as @larryhastings says. I can't imagine it will make any dent in cloning speeds. Of course, you don't know until you measure it, but I imagine there's a ton of good compression all up and down the stack. Git is supposed to be good at managing text files, so let it!

I'm not at all concerned about cloning speed. What I am concerned with is the UX of working with a bunch of tiny files. This is manageable (IMO) if we reduce the scope of the tiny files to just the latest . For example, if I notice a typo in the release notes, locating the tiny file is a lot more difficult than being able to edit that file in place in the browser. If we go farther and never commit the resulting output then we even lose the ability to easily search the news files using browser native searching.

brettcannon commented 7 years ago

To answer the "how many files are we talking about" question, I ran (master) > grep -c "^- " Misc/NEWS against the master branch and got 2,800 (and that covers all the way back to 3.5.0a1). And if you do find . -type f | wc -l you get 3,837; subtract out find .git -type f | wc -l and we end up with 3,776 files in a checkout of CPython.

larryhastings commented 7 years ago

One further idea: if we became concerned about zillions of historical files that nobody cares about (e.g. Misc/NEWS entries from 2.6), we could conglomerate those entries into single data files that actually contained multiple entries, and make sure that worked nicely with the tool. Thus the 2.6.0 directory could contain "Misc/NEWS.d/2.6.0/Core & Libraries.0.txt" which contained all items for that section in 2.6.0. Furthermore, we could hide historical versions in a subdirectory ("Misc/NEWS.d/archive/2.6.0/Core & Libraries.0.txt") so people wouldn't have to stare at them forever. We could then definitely automate that process (conglomerate and archive), which could be run at any time--as part of the release process, or later, or whenever.

larryhastings commented 7 years ago

To answer the "how many files are we talking about" question, I ran (master) > grep -c "^- " Misc/NEWS against the master branch and got 2,800

I just experimentally ran my "splitnews" against a "current" hg trunk. (Yeah, sorry.) It produced 2809 discrete files and 33 directories under Misc/NEWS.d.

dstufft commented 7 years ago

I guarantee that somebody you don't want to argue with (e.g. Guido) will be the special snowflake editing NEWS by hand. I think it's better to obviate the argument entirely.

I've never been afraid of arguing with Guido :), but given he's just about the only person who can decide by fiat that they're going to edit Misc/NEWS and that's that and everyone else just has to deal with it, I'd suggest instead of nebulous "but Guido might care!" we should just ask @gvanrossum.

What do you mean by "canonical files" here?

I do not mean Misc/NEWS, I mean all the little tiny files. cross file boundaries in the browser is less than great. Yes, grep can do either instance with roughly the same power, but grep isn't the lowest common denominator.

Can we ask Git itself? Would "git log" on the discrete file tell us the branch it was checked in under? (I just tried, and it seemed like the answer was "no", but I'm far from a Git expert so I wouldn't take my findings as authoritative.)

No. git has no idea what branch something was committed under. The only thing git can tell you about a commit is what branches/tags it currently exists in.

In any case, I don't understand your position here. You state this as if this is a critique of keeping the discrete files forever. But if this is really a problem, then we'll have this problem if we use the discrete files at all. Anybody cherry-picking / forward-porting / back-porting will have to deal with this.

You need some mechanism to indicate what version a particular "news snippet" is for.

If you keep each tiny file for all versions forever, then you need to structure either the file or the filename so as to keep that data. If that data is part of the filename or the file then it gets invalidated anytime a PR lives longer than the release it was originally created for.

If you have a singular file that has all of the "historical" news (i.e. not the release we're currently working on) and a directory that olds all of the "current" news snippets (i.e. the ones that are new in the release that will be released next) then you no longer have to track versions because it's either historical (in which case you don't care in the tooling) or it's for the next release (in which case it's in the directory as a snippet).

Worse yet, it means that there are two different workflows, depending on whether the cherry-pick is done before or after a release is cut. "Oh, you realized you need to backport ? Well, I just released 3.8b1, so your discrete NEWS entry file got deleted, you'll have to ". I suspect this second worfklow would include re-creating the discrete file in the backported branch, either by going back to before it was deleted and rescuing it, or by creating a new file and copying and pasting in from the generated NEWS file. I don't see that as a workflow improvement over "when you backport, make sure you move the Misc/NEWS entry too" in all cases. Again, TOOWTDI.

No, that's not how git works. If I do this:

$ touch Misc/News/my-cool-change.rst
$ git add Misc/News/my-cool-change.rst
$ git commit -m 'add my cool change'  # pretend this gave us the commit, 1a05a1a
$ compile-news  # This implicitly deletes Misc/News/* and compiles it to NEWS.rst
$ git commit add News.rst
$ git commit -m 'Release 3.7'
$ git checkout 3.6
$ git cherry-pick 1a05a1a

Then absolutely nothing you did after the 3rd command is going to have any affect on the cherry picked commit. I can't imagine a VCS working the way you seem to be describing. It's going to pull the entire commit into that branch.

Right now we have a miserable workflow wrt Misc/NEWS entries, particularly when back-porting / forward-porting / cherry-picking. I think the discrete files approach will make this less-bad. Having the canonical location be in one of two places muddies the waters, making the workflow more complicated and context-sensitive.

You basically never need to touch the "compiled" News.rst in this workflow. You can completely ignore it as a contributor and things will work just fine. The only singular time that you would need to directly edit it outside of running the release script is if you're editing a typo or something in a "historical" (i.e. already happened) news entry. In which case you are extremely unlikely to get a merge conflict unless two people edited the exact same line (in which case, no matter what you're getting a merge conflict).

My prototype generates Misc/NEWS from scratch every time. FWIW, making it parse Misc/NEWS so it can find the proper insertion points for all the new items would make it more complicated. Not an unmitigated disaster of complexity, but I prefer simple where I can get it.

I think this is possibly where we're not looking at the same thing with regards to the single file proposal. In this proposal you would not reparse the existing file at all. It's just a big chunk of arbitrary text that you prepend the new data onto. The only "parsing" you need to do is isntead of doing a "dumb" prepend ala cat file1 file2, you have to be slightly smart enough to prepend after the

+++++++++++
Python News
+++++++++++

block. Everything else is immaterial to the tool and simply doesn't matter, it is opqaue to it. This makes it even easier to convert to this process because we don't have to try and force all of the previous release NEWS into the correct shape. It's just more opqaue text.

ericvsmith commented 7 years ago

Isn't the tool to conglomerate many entries into a single data file called "tar", or maybe "zip"?

larryhastings commented 7 years ago

Isn't the tool to conglomerate many entries into a single data file called "tar", or maybe "zip"?

I'm not sure what you mean here. Are you seriously proposing that the tool that rebuilds Misc/NEWS read tarballs / zip files? How is that better than it reading a text file that's presumably already in the format we want it to be in?

brettcannon commented 7 years ago

Re-focusing the discussion

OK, I think we're getting side-tracked here with technical details for a tool that doesn't even exist yet (it's my bad for listing a potential green-field solution to guide requirements discussions). The more important question is whether any pre-existing tools do what we want already.

Leave Guido out of this

And there is no need to bug Guido about any of this; he will be the first to admit he doesn't commit enough to be a special snowflake for the workflow (hence why he put me in charge of making the workflow changes and explicitly told me to not to worry about him specifically).

Next Steps

Who wants to champion a pre-existing solution?

So far we have @dhellmann for reno. Anyone else care to step forward for some other tool? And if none of them do what we want then we can learn from the tools that were considered and write our own.

larryhastings commented 7 years ago

No, that's not how git works.

I was thinking of a more complicated example involving merging--did a bad job explaining, sorry.

Do we literally never merge between branches anymore? Because, if we ever ever merge, and Misc/NEWS is a canonical place for news entries, we'll have unwanted Misc/NEWS merges and conflicts and lions and tigers and bears, oh my. Happy to cite examples if relevant. I think this will be a much simpler problem to solve with discrete files.

Brett says:

Who wants to champion a pre-existing solution?

I'm championing writing a custom solution. I already have a prototype:

https://bitbucket.org/larry/mergenews

As stated before: I think we'd be better off writing the tool ourselves than trying to adapt another team's tool to taste. The hard part isn't the technology, the hard part is figuring out what workflow we want. Writing the tool is gonna be the easy part.

dstufft commented 7 years ago

Do we literally never merge between branches anymore?

My understanding is the blessed workflow is to make all changes to master, and then to use git cherry-pick to pull those changes back into previous branches without doing forward merges any more. This solved the problem of people inadvertently forward merging someone else's changes and such. So yea, we never merge between branches anymore afaik.

I could be wrong though!

dstufft commented 7 years ago

Who wants to champion a pre-existing solution?

I'd want to double check with hawkie that she even wants her tool to be considered for this, but assuming she does I guess I can champion towncrier. I'm planning on using it for pip anyways.

warsaw commented 7 years ago

On Feb 27, 2017, at 05:24 PM, Donald Stufft wrote:

Do we literally never merge between branches anymore?

My understanding is the blessed workflow is to make all changes to master, and then to use git cherry-pick to pull those changes back into previous branches without doing forward merges any more. This solved the problem of people inadvertently forward merging someone else's changes and such. So yea, we never merge between branches anymore afaik.

I could be wrong though!

You're not!

dhellmann commented 7 years ago

@brettcannon reno uses the git history to detect the versions to which a note applies (including repeating the contents across multiple branches). If the note file doesn't change across branches, it's only stored once in the git repository.

To more directly answer @dstufft's point, the version information for reno notes is not encoded in the filename, location, or content. reno uses the git commit history to determine when a file was added. It uses the most recent content of the file along the same branch, but associates that content with the earliest version (essentially "git describe --contains $sha" on the change where the file was added). This allows the same note file to be used unchanged in several versions on different branches, which was precisely the backport problem it seems the NEWS file presents to the python-dev team.

The categories for notes ("feature", "bug fix", whatever) are inside the file, and a note file can have multiple notes in multiple categories. The way we use it in OpenStack, we usually put one note at a time in a file. However, there are cases where we do things like deprecate a configuration option as part of fixing a bug, so the note file for that change might include related entries in both of those sections. The text is easy to review because it is in one file close together, but it may be rendered to the release notes page separately.

As @larryhastings says, it's quite possible a custom tool is best for the desired workflow. If every single notable change is tied to an existing bpo, then basing something on that would make sense. We don't make that assumption in OpenStack, so reno input files are standalone and any references to tracking artifacts like bugs or blueprints need to be included in the text of the note. Loose coupling FTW.

I'm not sure the concerns about the number of files involved are really warranted. The numbers don't seem that big to me. That said, given the format it would be relatively easy to mechanically combine a bunch of reno files into one big one just before a release.

ericvsmith commented 7 years ago

Agree with Brett on refocusing, but since this was asked of me directly:

Isn't the tool to conglomerate many entries into a single data file called "tar", or maybe "zip"?

I'm not sure what you mean here. Are you seriously proposing that the tool that rebuilds Misc/NEWS read tarballs / zip files? How is that better than it reading a text file that's presumably already in the format we want it to be in?

This was in answer to: "One further idea: if we became concerned about zillions of historical files that nobody cares about (e.g. Misc/NEWS entries from 2.6), we could conglomerate those entries into single data files that actually contained multiple entries, and make sure that worked nicely with the tool."

Surely such a tool could read a tar or zip file. If there's a requirement for it to also be human readable, then of course something else is needed.

1st1 commented 7 years ago

I have another idea how we can handle NEWS: we can use .gitattributes to specify a custom merge strategy for Misc/NEWS. The strategy can be implemented in a script that is committed to the repo, say tools/merge_news.

What tools/merge_news can do is to parse Mics/NEWS file, which has a well defined format, to a structure of sections and individual news entries. Then, depending on the nature of the merge, we can add some heuristics that will decide how to merge news when cherry-picking from the master branch to 3.6 etc.

I don't have a prototype yet, but I can experiment with this maybe this weekend.

methane commented 7 years ago

Does github support it? I googled and found this issue. https://github.com/isaacs/github/issues/487

rbtcollins commented 7 years ago

I'm also happy to champion Reno and or help test / improve it - disclosure, I was one of the designers, so I'm most definitely biased. As Doug says, combining multiple files into one is a thing we can do - and that was actually a designed in feature, so we wouldn't be abusing it by doing that.

brettcannon commented 7 years ago

@rbtcollins @dhellmann do you two want to grab a couple of examples from the NEWS file and convert them to how they would look under Reno so we can get a sense of the workflow?

@dstufft did you want to do the same for towncrier?

At this point I assume these are the only external tools we are considering and otherwise we will go with a custom solution if neither meet our needs.

dhellmann commented 7 years ago

@brettcannon yes, I intend to do that. I have some travel this week, so if I don't get to it tomorrow it will be early next week before I can do it (unless @rbtcollins beats me to it, of course).

brettcannon commented 7 years ago

@dhellmann No problem. Let's set a deadline of Friday, March 17 then for examples. Work for you?