Open Phrogz opened 2 years ago
Since I want the information anyhow, I decided to try converting to a database myself. The work in progress can be seen here, for your consideration. I've just started populating a master database file, have not yet worked on scripts to produce the MD or HTML currently available. It's definitely not a full replacement yet, but for consideration.
https://github.com/Phrogz/rubychanges/blob/dev/database-driven/_src/database.yaml
My rationale for some of the design:
title
of each entry should be concise and clear, so they can be understood out of context. I changed a few of your existing titles during manual conversion, e.g. appending the word "added" to titles where new methods were added.kind
of each entry is one of: addition
(new functionality or syntax that does not break old), change
(old code cannot be used as it used to), promotion
(making experimental features non-experimental) or removal
(getting rid of old behavior). You might ignore this in your HTML, or might use it for icons categorizing the changes.highlight
tag would be used to loft specific entries into a Highlights section. More granular ranking of "importance" may be desirable, e.g. to produce highlights-only, or highlights+generally useful, or highlights+useful+details few care about.summary
is an explanation of the change beyond the title, but not as deep as the code sample or your reason
.bug
and feature
and github-pull-request
, which would all go under "Discussion", but whose links and labels would be generated based on which type it is.docs
, I'm keeping just the URLs and planning on scraping the web pages to get the <title>
for display purposes. Dumb? Maybe.class
tag to each item to be able to search for just changes to specific classes. And maybe to help group changes. Some of these entries are sketchy, like using class: Method
for changes to parameter calling.docs
, feature
, bug
, class
) can either have a single value or an array for when it suits.Open questions:
id: "foo-bar"
into each and every entry.
id
when it's needed for a reference?Thanks for raising the topic! More formal DB was my original wish, and you summarized its pros & cons beautifully. From the perspective of the person who needs to, well, author it all, the show-stopper (why I didn't start with YAML—and I intended to!) is the convenience of authoring and the humanity of the result. It basically falls into:
That being said, both can be mitigated, and I actually have some ideas about how it all could work (but nobody asked before, and I never had enough time myself!). The virtues of having it in a more formal structure and being able to autorender some slices, reorder, etc. are obvious (and actually while producing "Evolution," I did some small automation: that's why I left "not important enough" features in the source file, just commented them with <!-- -->
: so next year it would be easier to compare automatically what's already in the file and include only missing things).
So, what I'd do in the direction of allowing the DB-alike usage while preserving humanity and authoring convenience (and that was the plan all along, but I never had the resource to implement it):
{Foo#bar}
in source files would be auto-replaced with links to Ruby docs)...About the cross-references and ids: it is hard question! Basically, currently I just do this (extracted from Kramdown header-to-id conversion) whenever I need to ensure ids are stable. Maybe in more powerful/formal structure, the raw titles can be used, e.g. "Follow-up: {2.7: Comparable#clamp
with Range
}" with parser smart enough to run it through header-to-id transformation. The problem is, of course, that if in the future somebody edits the headers in old files, all links would die.
The parser can handle it by link-validating, actually, but link dying is something that I am already trying to avoid (if somebody linked to the middle of the changelog years ago, I want the link to be alive forever, that's why, for example, Ruby 2.4's changelog has idiotic "Stdlib" section: I forgot to change the working title to proper "Standard library changes" before publishing and noticed it only in few months, and now I don't want to break people's links). It also can be somewhat mitigated by "renaming + assigning the old name as a secondary anchor," but :shrug:
PS: For some time, I investigated semi-formalized formats like ArchieML, but never found they compelling enough, so I came out with my own :shrug:
I'm glad you've thought along the same lines. I continued manual porting of 3.0 changes into the DB for testing, and created a quick-n-dirty first pass at a "distilled" overview of changes. Right now the different views are baked into the HTML by the script:
$ ruby distill.rb -h
Usage: ruby distill.rb [options]
-h, --help Prints this help
--releases Show a list of documented releases
-f, --from 2.7 Show only changes after this release (default: 2.7)
-t, --to 3.1 Show only changes up to and including this release (default: 3.1)
-v, --verbose Show debugging output during run
-b, --breaking-only Show only changes that modify the way the language works, potentially affecting existing scripts
-l, --language-only Show only changes to the language (not specific classes/methods)
-i, --important Show only the most important changes
-r, --relevant Show only major/medium changes (ignore esoteric changes)
-o, --output changes.html Set the output filename (default: ruby-changes.html)
...but my plan is to generate a single uber document with Javascript filtering, and the ability to click on any change and see all the amazing information you've provided.
After 3.0 conversion I stopped the manual YAML conversion and made a pass at scraping the Markdown procedurally. I want to be able to stand on top of your work in the future, and not have a static snapshot that took hours to create and never is updated again.
The scraping is almost done, but it has the problem that I'd like to add metadata to each entry (like the "importance" level I'm using in my DB to filter between high/medium/low) that I don't think should be presented in URLs. Any thoughts on how to include such metadata per change if you continue to use Markdown?
Forgot to add pictures of what it is producing so far, to help sell why I think this is so important. :) Imagine a simple filter UI at the top of the page, getting this information, hovering each item to see a tooltip with a summary going into a little more detail, or clicking on an item and seeing full details filling the screen.
Your results are looking awesome :heart:
I'd like to add metadata to each entry (like the "importance" level I'm using in my DB to filter between high/medium/low) that I don't think should be presented in URLs. Any thoughts on how to include such metadata per change if you continue to use Markdown?
I think we can just add new list item types to sources in markdown. And either ignore it on rendering of the current site, or, well, not ignore but render them prettily, they are welcome addition :)
Alright, I'll head down that path. Thanks :)
Making progress. I could not think of an elegant way to add the information, so thus far its just an extra "field" in the Markdown that looks like this (see last line):
#### `Numeric#finite?` and `#infinite?`
* **Reason:** The methods were present in `Float` and `BigDecimal`, but not in other numeric classes, which made it harder to write code uniformly processing numbers which may be integer/float/infinite.
* **Discussion:** [Feature #12039](https://bugs.ruby-lang.org/issues/12039)
* **Documentation:** [Numeric#infinite?](https://ruby-doc.org/core-2.4.0/Numeric.html#method-i-infinite-3F), [Numeric#finite?](https://ruby-doc.org/core-2.4.0/Numeric.html#method-i-finite-3F)
* **Code:**
… removed here to stop fighting GitHub formatting …
* **Note:** Notice that `infinite?` returns `nil`/`-1`/`1` (always `nil` for integers), not `true`/`false` as most of other predicate methods. While unusual, it is convenient for checking both for infinity and its sign (+Infinity/-Infinity), and can be treated effectively as `true`/`false` in boolean context.
* **Metadata:** `{kind:addition, importance:medium, scope:Numeric}`
Feel free to suggest alternatives.
Progress update: I've annotated 2.4 and half of 2.5, and am now scraping that information and emitting a single HTML file with all changes and details—weighs in around 260k—with JS that allows live filtering of the change summaries (see top of screenshot below). Still TODO:
TBH, I'd prefer not to introduce additional nested "pseudo-language" into the structure. What I strived for is the balance between formality and readability/writeability, including self-evidence of the format. I would've gone with this:
#### Numeric#finite? and #infinite? (addition, medium)
* **Kind:** Addition\n* **Importance**: Medium
...so that even not preprocessed source would be readable as markdown & HTML. The first option is probably enough: the "tags" are short, obvious, and non-conflicting; the second one is a bit easier to auto-parse. In either option, the parser can swear if it meets some unrecognized value.
As for scope, I'd try to auto-guess that by header and/or links to docs. It would be a constant small irritation to say "Title: Numeric#foo?; docs: Numeric#foo?, scope: Numeric (can't you guess it already!)". It is all somewhat informal, but I tried to keep (some) consistency. I believe a small set of heuristics + clear diagnostics "ugh, the parser can't guess the scope, can you rephrase?" + maybe a fallback for optional * **Scope:** Numeric
for complicated/less formal cases should be handleable.
hi @zverok @Phrogz while I have known this project for a longer time, I only found this discussion today :) I've been running a very similar project for the last 5 years. My ruby-changelog website was meant to be a very compact list of the most important changes with some code examples.
I did start with JSON file(s) as a source of truth for rendering .md files. The schema of the main data source looks like this:
{
"ruby_versions": [
{
"version": "3.2",
"version_info": "3.2.0 (Dec 2022) - 3.2.2 (March 2023)",
"state": "Supported",
"eol": "2026-03-31",
"minors": [
{ "version": "3.2.2", "release_date": "2023-03-30", "end_date": "" },
{ "version": "3.2.1", "release_date": "2023-02-08", "end_date": "2023-03-30" }
],
"implementations": [
{
"name": "MRI 3.2.2",
"url": "https://www.ruby-lang.org/en/news/2023/03/30/ruby-3-2-2-released/"
}
],
"changes": [
{
"type": "internals",
"tags": ["performance"],
"experimental": false,
"summary": "WASI based WebAssembly support",
"links": {
"news": "https://itnext.io/final-report-webassembly-wasi-support-in-ruby-4aface7d90c9"
}
},
{
"type": "internals",
"tags": ["performance"],
"experimental": false,
"summary": "Production-ready YJIT"
}
]
}
]
}
While my project doesn't really aspire to be as comprehensive as rubyreferences
, I think the data schema which represents language changes is something that we could share in both projects.
I wonder what is your progress in defining this language-changes schema?
I'd love to be able to consume this fantastic information you've created and provide an alternative visualization for it. For that, it would be far easier to consume a database that has discrete change entries with fields like:
version
,category
,title
,class
,method
,summary
,overview
,reason
,discussion
,documentation
,example_code
,notes
, and so on.The contents of many of these fields could/would be Markdown, and the full Markdown or HTML as present today could be generated from them.
Pros
String
class across all versions, or only the changes from 3.0 to 3.1.Notes:
vs.Note:
Cons
notes
value per entry, or oneexample_code
section, but you wanted two separate notes labeled separately.