ietf-tools / datatracker

The day-to-day front-end to the IETF database for people who work on IETF standards.
https://datatracker.ietf.org
BSD 3-Clause "New" or "Revised" License
616 stars 383 forks source link

enable easy retrieval of `sourcecode` elements from published RFCs #5544

Open dkg opened 1 year ago

dkg commented 1 year ago

Description

Some XML-derived RFCs contain sourcecode elements, which basically embed bytestring files verbatim in the RFC.

Extracting those elements from the html or txt content is kind of a pain to do manually, and most people don't have XML tooling handy to be able to extract them from the XML directly either.

It might be nice if the datatracker could, for a published RFC (and maybe for internet drafts?) automate the extraction of those elements and make it easy to fetch them, either singly or in a bundled archive.

As an example, draft-ietf-lamps-header-protection contains about 40 sample e-mail messages as test vectors, each of them explicitly named and identified in sourcecode elements in the .xml. It would be great to see a link in the datatracker that lets the user just download a tarball or zipfile that contains all of those elements in a folder.

Code of Conduct

JayDaley commented 1 year ago

This is a great idea.

cabo commented 1 year ago

1.6.32: Add kramdown-rfc-extract-sourcecode

Usage: kramdown-rfc-extract-sourcecode [options] document.xml -t, --to=FMT Target format ["list", "files", "zip", "yaml"] -d, --dir=DIR Target directory (default: sourcecode)

Get with gem update kramdown-rfc

cabo commented 1 year ago

So you could do a

kramdown-rfc-extract-sourcecode -tzip draft-ietf-lamps-header-protection-14.xml 

and ship the resulting sourcecode.zip

cabo commented 1 year ago

It is somewhat annoying that the content of sourcecode elements conventionally starts with an empty line. This is not allowed in an .eml, which makes the extracted snippets invalid in @dkg 's example document.

Is the

        text = text.strip('\n')

(line 826 xml2rfc/writers/text.py 1) documented anywhere (i.e., is this a feature of RFCXML that can be relied upon)?

cabo commented 1 year ago

1.6.33: kramdown-rfc-extract-sourcecode now also handles RFC 8792 unfolding (disable with --no-unfold).