jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.58k stars 1.58k forks source link

jq manual remake #3183

Open 01mf02 opened 1 month ago

01mf02 commented 1 month ago

I would like to extend the jq manual with information about diverging behaviour between different jq implementations (jq, gojq, jaq, ...). For this, I will have to go through the whole manual once, which I think would be a nice opportunity to adapt its format.

Currently, the jq manual is written as a YAML file, which mostly contains Markdown blocks and examples. I thought about transforming it to a pure Markdown file, and then creating HTML from it via Pandoc. The advantages of this would be:

To show the feasibility of it, I recreated a part of the "Conditionals and Comparisons" section of the jq manual in Markdown. You can use

pandoc conds.md -s -o conds.html --section-divs --lua-filter filter.lua

to render this to the following HTML (sorry, GitHub doesn't let me upload HTML directly, so I exported the HTML to PDF and uploaded that).

This requires the file filter.lua, which I wrote today in about 2 hours (I don't know Lua ^^) and which currently looks like this:

function Header(el)
  if el.content[1].text == "Examples" then
    --print(dump(el))
    el.attr.classes:insert("examples")
  end
  return el
end

function Code(code)
  code.classes[1] = "jq"
  --print(dump(code))
  return code
end

function CodeBlock(block)
  --print(dump(block))
  --if block.classes[1] == "jq-test" then
  rows = {}
  categories = {"Filter", "Input", "Output"}
  local i = 1
  for line in block.text:gmatch("[^\n]+") do
    local code = pandoc.Code(line)
    local lang = "json"
    if i == 1 then lang = "jq" end
    code.classes[1] = lang
    table.insert(rows, {pandoc.Plain(categories[i] or ""), code})
    i = i + 1
  end
  simple_table = pandoc.SimpleTable(
    "", -- caption
    {pandoc.AlignDefault, pandoc.AlignDefault},
    {0, 0}, -- let pandoc determine col widths,
    {}, -- headers
    rows
  )
  return pandoc.utils.from_simple_table(simple_table)
end

I am fairly confident that I would be able to convert the whole manual to this Markdown format. But I'm only going to do this if it's likely that this is going to be merged into jq. So the question is: Would you consider merging such a change?

01mf02 commented 1 month ago

I've converted a few more sections now and enabled "Run" links for jqplay.org, the result is here.

itchyny commented 1 month ago

How do you generate man.test, manonig.test from Markdown?

pkoppstein commented 1 month ago

@01mf02 - Rewriting the jq manual in the way you've described sounds like a very ambitious project!

I was wondering whether you've considered (either as an alternative or at least as a prelude) revising existing documentation on the jq wiki, and in particular the jq-Language-Description page.

I would also like to see a new page under the Tips section, which already has a "Regarding gojq" page.

It would be really helpful to have an up-to-date "Regarding jaq" page, and you could perhaps give a preview there of some of the content you envision for the "official documentation".

01mf02 commented 1 month ago

How do you generate man.test, manonig.test from Markdown?

Creating the combination of man.test and manonig.test is quite simple; it involves printing all example code blocks to stdout in the Lua filter (a one-line instruction) and piping the result into man.test. It is a bit more annoying to separate out those tests that involve regular expressions, but it can be done, in the worst case by post-processing the exported tests.

01mf02 commented 1 month ago

@01mf02 - Rewriting the jq manual in the way you've described sounds like a very ambitious project!

It's might actually be a less ambitious project than I thought, for I was able to convert the whole manual to HTML/PDF. The result is here. As you can see, jq syntax highlighting already works in the PDF version! (The examples look a bit wonky, but please recall that I performed zero styling of the output so far.)

I did this as follows:

I converted manual.yml to manual.json with a converter, then used jq to convert the result to Markdown:

jq -r -f convman.jq manual.json > manual.md

The jq script convman.jq is:

"---\ntitle: jq manual\n---\n\n",
.body,
(.sections[] | (
  "# \(.title)",
  .body // "",
  (.entries[]? | (
    "## \(.title)",
    .body,
    if .examples then "::: Examples\n", (.examples[] | (
      "~~~",
      (.program | gsub("\n"; " ")),
      .input,
      (.output | if . == [] then "\n" else .[] end),
      "~~~\n"
    )), ":::\n" else empty end
  ))
))

From this, I use Pandoc to convert Markdown to HTML:

pandoc manual.md -s -o manual.html --section-divs --lua-filter filter.lua > man.test

This creates an HTML file manual.html and prints on stdout the combination of man.test and manonig.test. This uses the following Lua filter (filter.lua):

-- inline code is always written in jq
function Code(code)
  code.classes[1] = "jq"
  return code
end

-- code blocks are assumed to be in jq if no other language is given
function CodeBlock(block)
  if next(block.classes) == nil then
    block.classes[1] = "jq"
    return block
  end
end

function Div(el)
  if el.classes:includes'Examples' then
    return pandoc.walk_block(el, {CodeBlock = function(block)
      -- print example to stdout
      print(block.text .. "\n")
      return exampleTable(block.text)
    end})
  end
end

function exampleTable(test)
  local _, _, filter, input, output = test:find("([^\n]+)\n([^\n]+)\n(.*)")
  local url = "https://jqplay.org/jq?q=" .. encodeUrl(filter) .. "&j=" .. encodeUrl(input)
  simple_table = pandoc.SimpleTable(
    "", -- caption
    {pandoc.AlignRight, pandoc.AlignLeft},
    {0, 0}, -- let pandoc determine col widths,
    {}, -- headers
    {
      {pandoc.Plain("Filter"), pandoc.Code(filter, {class = "jq"  })},
      {pandoc.Plain( "Input"), pandoc.Code( input, {class = "json"})},
      {pandoc.Plain("Output"), pandoc.Code(output, {class = "json"})},
      {pandoc.Link("Run", url), {}}
    }
  )
  return pandoc.utils.from_simple_table(simple_table)
end

function encodeUrl(str)
  str = string.gsub(str, "\n", "\r\n")
  str = string.gsub(str, "([^%w%.%- ])", function(c) return string.format("%%%02X", string.byte(c)) end)
  str = string.gsub(str, " ", "+")
  return str
end

We can also create a PDF file, by going through Typst:

pandoc manual.md -s -o manual.typ --section-divs --lua-filter filter.lua
typst c manual.typ

I used this to produce the PDF file at the beginning of this post.

I was wondering whether you've considered (either as an alternative or at least as a prelude) revising existing documentation on the jq wiki, and in particular the jq-Language-Description page.

I have not considered this so far. In the context of the present issue, I would like to concentrate on making the format of the manual easier to modify. Revising documentation is a different issue.

I would also like to see a new page under the Tips section, which already has a "Regarding gojq" page.

It would be really helpful to have an up-to-date "Regarding jaq" page, and you could perhaps give a preview there of some of the content you envision for the "official documentation".

I think the best would be if any information that would be available at some "Regarding jaq" page would be directly integrated into the manual. That makes it much easier for users to find information and to keep it up-to-date. An example of what I imagine is: Currently, the user manual says the following about input:

## `input`

Outputs one new input.

My idea is to enhance it as follows:

## `input`

Outputs one new input.

::: Compatibility
Available since jq 1.5.
When there is no more input,
jq returns an error, whereas
jaq returns no output (`empty`).
:::

The compatibility statement could be rendered in the documentation as a block. I would like to do this for all the filters documented in the manual. In the long run, this would also make it possible to have a unified documentation for all jq versions on the same page, by explicitly mentioning since which version a feature is available.

01mf02 commented 1 month ago

By the way, Pandoc can also create man pages, so we can use it to generate jq(1) from the Markdown source.

01mf02 commented 1 month ago

@itchyny, I've found a way to generate both man.test and manonig.test: When exporting all tests (all.test) from the Markdown manual, I terminate every test with \0. That way, as a second step, we can split all tests one-by-one and write those that contain regex filters into a separate file:

mkdir -p tests
split --separator='\0' -l1 all.test tests/
REGEX="test|match|capture|scan|split|splits|sub|gsub"
grep -L -E $REGEX tests/* | xargs cat | sed 's/\x0/\n/g' > man.test
grep -l -E $REGEX tests/* | xargs cat | sed 's/\x0/\n/g' > manonig.test
rm -r tests
01mf02 commented 1 month ago

I've now performed the conversion of the manual to Markdown, see #3186.