Generate changelog from an actual data format?

JoshCheek commented 10 years ago

Rather than trying to parse the Changelog, wouldn't it make more sense to store it in some real data format, and then just generate it?

e.g.

require 'json'

# LOAD THE DATA STRUCTURE
changelog = JSON.load <<CHANGELOG_JSON, nil, symbolize_names: true
{ "description": "All notable changes to this project will be documented in this file.",
  "entries": [
    { "version"  : "0.0.4",
      "released" : "2014-08-09",
      "added"    : ["Better explanation of the difference between the file (\\"CHANGELOG\\") and its function \\"the change log\\"."],
      "changed"  : ["Refer to a \\"change log\\" instead of a \\"CHANGELOG\\" throughout the site to differentiate between the file and the purpose of the file — the logging of changes."],
      "removed"  : ["Remove empty sections from CHANGELOG, they occupy too much space and create too much noise in the file. People will have to assume that the missing sections were intentionally left out because they contained no notable changes."]
    },
    { "version"  : "0.0.3",
      "released" : "2014-08-09",
      "added"    : ["\\"Why should I care?\\" section mentioning The Changelog podcast."]
    },
    { "version"  : "0.0.2",
      "released" : "2014-07-10",
      "added"    : ["Explanation of the recommended reverse chronological release ordering."]
    },
    { "version"  : "0.0.1",
      "released" : "2014-05-31",
      "added"    : [
        "This CHANGELOG file to hopefully serve as an evolving example of a standardized open source project CHANGELOG.",
        "CNAME file to enable GitHub Pages custom domain",
        "README now contains answers to common questions about CHANGELOGs",
        "Good examples and basic guidelines, including proper date formatting.",
        "Counter-examples: \\"What makes unicorns cry?\\""
      ]
    }
  ]
}
CHANGELOG_JSON

# NORMALIZE THE DATA STRUCTURE
changelog[:entries] ||= []
changelog[:entries].each do |entry|
  entry[:changed] ||= []
  entry[:added]   ||= []
  entry[:removed] ||= []
end

# LOAD THE TEMPLATE
template = <<ERB_TEMPLATE
# Change Log
<%= changelog[:description] %>

<% changelog[:entries].each do |entry| -%>
## <%= entry[:version] %> - <%= entry[:released] %>
<% if entry[:added].any? -%>
### Added
<% entry[:added].each do |added| -%>
- <%= added %>
<% end -%>

<% end -%>
<% if entry[:changed].any? -%>
### Changed
<% Array(entry[:changed]).each do |changed| -%>
- <%= changed %>
<% end -%>

<% end -%>
<% if entry[:removed].any? -%>
### Removed
<% Array(entry[:removed]).each do |removed| -%>
- <%= removed %>
<% end -%>

<% end -%>
<% end -%>
ERB_TEMPLATE

# RENDER THE TEMPLATE
require 'erb'
puts ERB.new(template, 0, '-').result(binding)

# >> # Change Log
# >> All notable changes to this project will be documented in this file.
# >> 
# >> ## 0.0.4 - 2014-08-09
# >> ### Added
# >> - Better explanation of the difference between the file ("CHANGELOG") and its function "the change log".
# >> 
# >> ### Changed
# >> - Refer to a "change log" instead of a "CHANGELOG" throughout the site to differentiate between the file and the purpose of the file — the logging of changes.
# >> 
# >> ### Removed
# >> - Remove empty sections from CHANGELOG, they occupy too much space and create too much noise in the file. People will have to assume that the missing sections were intentionally left out because they contained no notable changes.
# >> 
# >> ## 0.0.3 - 2014-08-09
# >> ### Added
# >> - "Why should I care?" section mentioning The Changelog podcast.
# >> 
# >> ## 0.0.2 - 2014-07-10
# >> ### Added
# >> - Explanation of the recommended reverse chronological release ordering.
# >> 
# >> ## 0.0.1 - 2014-05-31
# >> ### Added
# >> - This CHANGELOG file to hopefully serve as an evolving example of a standardized open source project CHANGELOG.
# >> - CNAME file to enable GitHub Pages custom domain
# >> - README now contains answers to common questions about CHANGELOGs
# >> - Good examples and basic guidelines, including proper date formatting.
# >> - Counter-examples: "What makes unicorns cry?"
# >>

mvz commented 10 years ago

This makes authoring the change log a lot harder.

JoshCheek commented 10 years ago

In what way?

mvz commented 10 years ago

By having to type running text as JSON literals.

JoshCheek commented 10 years ago

You could use YAML, it has nice support for string literals.

mvz commented 10 years ago

Yes, YAML is a lot nicer. A quick dump from your example gives:

---
:description: All notable changes to this project will be documented in this file.
:entries:
- :version: 0.0.4
  :released: '2014-08-09'
  :added:
  - Better explanation of the difference between the file ("CHANGELOG") and its function
    "the change log".
  :changed:
  - Refer to a "change log" instead of a "CHANGELOG" throughout the site to differentiate
    between the file and the purpose of the file — the logging of changes.
  :removed:
  - Remove empty sections from CHANGELOG, they occupy too much space and create too
    much noise in the file. People will have to assume that the missing sections were
    intentionally left out because they contained no notable changes.
- :version: 0.0.3
  :released: '2014-08-09'
  :added:
  - '"Why should I care?" section mentioning The Changelog podcast.'
- :version: 0.0.2
  :released: '2014-07-10'
  :added:
  - Explanation of the recommended reverse chronological release ordering.
- :version: 0.0.1
  :released: '2014-05-31'
  :added:
  - This CHANGELOG file to hopefully serve as an evolving example of a standardized
    open source project CHANGELOG.
  - CNAME file to enable GitHub Pages custom domain
  - README now contains answers to common questions about CHANGELOGs
  - Good examples and basic guidelines, including proper date formatting.
  - 'Counter-examples: "What makes unicorns cry?"'

masukomi commented 9 years ago

So, just to be clear you want to take a document whose whole purpose in life is to be read by humans, encode it in a format designed to be read by computers, where syntax errors can be easily encoded.

This action then restricts the pool of people who can contribute to it to the geeks, a group that is known for its inability to communicate well with "normal" people, and then have those people attempt to write a document for "normal" people. Even if you are keeping a change log for a geek library the wording should still be high enough level that you don't have to have much (or any) understanding of the internals of the thing being documented to understand what the changes are.

All this, because you don't want to parse Markdown? The minimal additional conventions are clearly specified in the guide and trivial to parse. Yes, not as trivial to parse as text created specifically for a computer to parse it, but .... to go to JSON or YAML just defeats the whole point. Most people will have less than zero interest in manually maintaining a JSON / YAML change log file when the alternative is just using Markdown with a couple conventions.

What we've got here with this project is a format that's simple to write and parse by humans AND computers. Tooling can be easily built around it. We don't need to make it hard for humans to create just to make it easy for developers to parse.

JoshCheek commented 9 years ago

So, just to be clear you want to take a document whose whole purpose in life is to be read by humans, encode it in a format designed to be read by computers, where syntax errors can be easily encoded.

"Yes" in the sense that I'd like it to be encoded in a format that a program can read, I don't see the point in creating data that can only be utilized by humans, so I'm proposing it be written in a format for both. Simple example, you decide you want to change something, and now you have 5k entries to update. You'll, of course, do something like ruby -ne ..., but you'll still have to check it by hand, because it's not in a real format. So, "no" in the sense that its "whole purpose in life is to be read by humans". Also, "no" in the sense that that "syntax errors can be easily encoded", as placing it in a real format prevents syntax errors (it won't parse).

This action then restricts the pool of people who can contribute to it to the geeks, a group that is known for its inability to communicate well with "normal" people, and then have those people attempt to write a document for "normal" people. Even if you are keeping a change log for a geek library the wording should still be high enough level that you don't have to have much (or any) understanding of the internals of the thing being documented to understand what the changes are.

You should let go of your labels, people are just people. Whether they code or not doesn't justify such stereotypes. Also, I see no reason to believe that not being a "geek" means you can't figure out how YAML works (though, curiously, you seem to think Markdown is within their apparently limited capability). Also, this perspective is myopic in the sense that the point of using a specified format is that a computer can manipulate it, so if there was value in it, a tool would be created.

All this, because you don't want to parse Markdown? The minimal additional conventions are clearly specified in the guide and trivial to parse. Yes, not as trivial to parse as text created specifically for a computer to parse it, but .... to go to JSON or YAML just defeats the whole point. Most people will have less than zero interest in manually maintaining a JSON / YAML change log file when the alternative is just using Markdown with a couple conventions.

I think you're too quick to generalize. I have no opinion about parsing Markdown, if people write it consistently. But the fact that it's markdown means that people will write it like... well... Markdown. Which is to say utterly ad-hoc, at which point you'll either give up (this is what you'll actually do), or go in by hand and try to figure out how to convert what was written to something you can parse (spelling errors will be simple, but at some point, entries will diverge sufficiently to require complete translation).

mvz commented 9 years ago

I think the problem of unparseable change logs is better fixed by having some changelog-lint tool, rather than changing the format to something barely human-readable.

JoshCheek commented 9 years ago

I like that idea, too.

bigwhoop commented 9 years ago

Just wanted to mention that I wrote a little tool that creates keep a changelog files based on Trello lists. It's written in PHP and not standalone, so thus probably not for everyone. https://github.com/bigwhoop/trellog

masukomi commented 9 years ago

You should let go of your labels, people are just people. Whether they code or not doesn't justify such stereotypes. Also, I see no reason to believe that not being a "geek" means you can't figure out how YAML works (though, curiously, you seem to think Markdown is within their apparently limited capability). Also, this perspective is myopic in the sense that the point of using a specified format is that a computer can manipulate it, so if there was value in it, a tool would be created.

People are not "just people". I can assure you that attempting to get any of the client managers to write raw JSON or YAML would be a disaster. They might get it right some of the time, or even most of the time, but they're never going to consistently indent with tabs vs spaces and they don't even have tools to tell the difference easily. Then they'll start editing the files in MS Word, because that's what they all use, and you can just imagine what that'll lead to.

Different people have different skills and different ways of thinking. Programmers / "geeks" have a very useful set of skills than the rest of the folks on the team are typically quite ignorant of.

though, curiously, you seem to think Markdown is within their apparently limited capability

Have you seen how many apps on iOS are using markdown as their text input format because WYSIWYG editing is a pain on iOS? I think it's well proven that non-geeks can handle markdown.

Simple example, you decide you want to change something, and now you have 5k entries to update.

Under what circumstances would you ever be updating 5 thousand entries? Hell, when would you even be updating 20? This is a changelog. If you're talking about editing a 5k file then, so what?

but you'll still have to check it by hand, because it's not in a real format.

um... markdown is very checkable. A) there's a spec B) there's a markdown lint too c) We're not talking about MathJax here.

Many of your comments seem to be based on the "not a real format" argument. Markdown may be difficult to write a parser for, but it is very much "a real format".

Also, this perspective is myopic in the sense that the point of using a specified format is that a computer can manipulate it, so if there was value in it, a tool would be created.

That's half of the point of this project. To create a standardized format for Change log files so that tools can be written to parse it. Everyone's doing it differently so you can't easily write something to parse all the different implementations. If we standardize around this we can then write tools. As for parsing and manipulating markdown, tools HAVE been created. I don't know what makes you think they haven't. To @mvz's point about a changelog lint. That's absolutely doable while still staying with markdown. All we need is a standard.

I have no opinion about parsing Markdown, if people write it consistently. But the fact that it's markdown means that people will write it like... well... Markdown. Which is to say utterly ad-hoc

This is a lot like HTML on the internet. People write complete and utter crap html, but you know what? They wouldn't if the browsers didn't keep happily, and silently compensating for their stupidity. If the browsers said "Hey, this is crap, and here are the problems we found with it." then we wouldn't be plagued with web sites that are only parseable when run through an engine to convert all the bs and mistakes into something legit first.

If, however, we stopped writing tools to take MD down the same route as HTML has gone we wouldn't have the problem of inconsistent markup. Then again, I don't really see anyone being terribly concerned about the horribly inconsistent HTML or the missing end tags which make it unparseable as XML, and that seems to have a pretty successful markup format.

mvz commented 9 years ago

@masukomi :+1:

JoshCheek commented 9 years ago

Different people have different skills and different ways of thinking. Programmers / "geeks" have a very useful set of skills than the rest of the folks on the team are typically quite ignorant of.

If your business people are writing your code's changelog,then I am confused. And your designers and QA are absolutely capable of learning YAML and JSON.

Many of your comments seem to be based on the "not a real format" argument. Markdown may be difficult to write a parser for, but it is very much "a real format".

Markdown is a format for displaying content, it translates to paragraphs and headings. It is not a format for holding data (hence the title of the thread: "an actual data format"). If I asked you to show me a timeline of when features were released (ie in d3), how would you do this? what is the Markdown format for a date, or for when a feature was released? You're dependent on consistent use of date formats, consistently placing them in headers, after a version and a dash. But Markdown will not enforce that you always place your dates in an h2 after a version string and a dash, in YYY-MM-DD format, and my experience has been that this inevitably drifts, and does so in irreparable ways such as "March"? or "2012", you'll see YYYY-DD-MM and YYYY-MM-DD, and "week1". Even if it didn't drift, we're parsing presentation output in order to extract data. There's a reason we prefer APIs over scraping.

Under what circumstances would you ever be updating 5 thousand entries? Hell, when would you even be updating 20? This is a changelog. If you're talking about editing a 5k file then, so what?

The first two changelogs I could think of are both in this ballpark:

$ curl https://raw.githubusercontent.com/ruby/ruby/trunk/ChangeLog | wc -l
   11094
$ curl -s https://raw.githubusercontent.com/rspec/rspec-core/master/Changelog.md | wc -l
    1682

mvz commented 9 years ago

But Markdown will not enforce that you always place your dates in an h2 after a version string and a dash, in YYY-MM-DD format

Hence my proposal for a simple linter that would check for these things.

markhuot commented 9 years ago

I think there's some confusion between a changelog and a git log here. I would prefer to keep my changelog human readable, in Markdown, so that it is easier to read and update. If I need to link a user directly to a date or specific change I could use a named anchor in my Markdown.

If I need to link someone to a specific change or a line update I'll use Github for that and link to a commit or a diff.

masukomi commented 9 years ago

If I asked you to show me a timeline of when features were released (ie in d3), how would you do this? what is the Markdown format for a date, or for when a feature was released? You're dependent on consistent use of date formats, consistently placing them in headers

Again, that's what this project is all about. IF they were following the standards set forth in this standardization project then it'd be pretty damn simple because there would be a known, and consistent date format and it'd be in a known, and consistent place.

The point of this project is to set a standard so that things can be easily parsed by machines and humans.

To answer your question more directly... I'd iterate over the headers looking for ones that contained a semantic version number followed by a hyphen and then the date format specified by this specification. I'd parse the date. I'd associate the subheaders with that date, etc., etc.. It's really not hard.

You really need to understand that this project is a parseable format specification. It's like you're trying to complain that XML or SGML could contain practically anything including custom tags that mean who-knows-what and thus parsing XHTML is difficult. It isn't. XHTML conforms to a spec that represents a limited subset of options. This project is exactly the same thing. If you want to make a document that is parseable (like XHTML) you need to create it using the rules of the specification that makes it parseable. This project is setting forth those rules. The wide open nature of what can be in a markdown document doesn't apply here. This project represents a limited subset of what can be in a markdown document so that we can parse it.

olivierlacan commented 9 years ago

I know this is how bad responses start, but with all due respect to @JoshCheek, this doesn't make any sense to me. There has been a lot of back and forth on the topic. I don't have much to add on the nay side.

If we want a data storage format, we have Git. We can parse and output the data git stores. To be honest I've love for git tags to eventually be the central canonical repository of human-friendly change logs. That seems coherent with git commit messages in my book. Problem is, the git tag interface is still pretty poor. Maybe the people interested in this project should work together to improve it.

I'd love for git tag -n to be able to properly list all tags, consider the first line of the tag message as the title and the rest (after a new line) the body as git already does for git commit messages. Right now the only way to list all tags with somewhat full messages is to trick the -n flag with something like git tag -n99 to display the first 99 lines of the message.

Something built on git tag -n could easily extract most of what we need to programmatically generate a change log that would fit within the keepachangelog.com guidelines.

lunemec commented 9 years ago

@JoshCheek while I understand that having "real" data format could have its advantages - some project already do that - for example Python packages on PYPI - https://pypi.python.org/pypi already have a changelog like structure for parsing and installation.

These files are however (to my knowledge) not used as real CHANGELOGs but just as information about the versions of the package.

But I believe this CHANGELOG should serve programmers as well as users, so markdown :+1:

Abdillah commented 9 years ago

@JoshCheek: I think the data format is better when we used it internally in graphical CHANGELOG.md builder. So, the software still outputting readable standardized CHANGELOG.md.

But, I yet don't know how such tool help us more on writing this little document?

olivierlacan commented 8 years ago

Closing this issue. I've explained my point of view on it. If you're looking for a data format for storing changes, you're forgetting about git (or other versioning systems), which is the canonical source of such data.

olivierlacan / keep-a-changelog

Generate changelog from an actual data format? #27