Open Alhadis opened 2 years ago
Is it helpful to start with Pandoc and then audit and hand-edit any missing information?
It would be if we were converting a simple text-based file format. Unfortunately, we're dealing with a particularly gross binary format, and converting binary document formats to another tends to be hit-and-miss at best. For documents of non-trivial size and complexity (nested headings, ToCs, inset displays, etc), it's probably easier to do a manual conversion instead.
Pandoc says that they can convert Word documents so why not give it a whirl and see how well it does? I didn't see any complex formatting in the word doc I looked at but I only looked at one.
Richard, are you interested in giving it a try?
@LegalizeAdulthood A manual conversion carries another benefit: things like code samples, equations, section cross-references (etc) can be added in situ as I'm going through the document. Things which aren't present in the original DOC file, but could be meaningfully added.
Just to be clear, I'm not expecting you or anybody else to go to this level of effort.
@Alhadis My suggestion would be to convert one of the DOC files that you feel is typical and issue a pull request for that one doc file. That will give something concrete for the discussion.
I've been using Sphinx for some projects and I like it. (Clang uses Sphinx as its documentation generator, for instance.)
I would suggest doc/simh_doc.doc and doc/simh_faq.doc.
A pull request isn't needed. Just zip the result and drop it here in this discussion.
A pull request allows us to see the results inline instead of having to download something, unpack it, view it outside of github and then come back to comment. A pull request allows us to comment on individual lines within the proposed replacement. A pull request is github's form of a "code review". If the test file is acceptable, then the pull request for that one doc can be accepted and you can go on to the next/remaining docs.
I'm not sure where/how you get to see the resulting file (in it's formatted glory) in-line with the pull request. Likewise, viewing the details of the raw formatted content is relevant. What mechanism would implicitly allow that?
How would such files be edited in the future? How, during the editing process, would an editor be able to see what their edits look like before generating pull requests?
I'm not sure where/how you get to see the resulting file (in it's formatted glory) in-line with the pull request. Likewise, viewing the details of the raw formatted content is relevant. What mechanism would implicitly allow that?
GitHub supports the rendering of reStructuredText documents the same way it renders Markdown files. And like Markdown, reStructuredText diffs can be reviewed intuitively by highlighting added, deleted and modified sections of text (example).
I would suggest doc/simh_doc.doc and doc/simh_faq.doc.
I'll see what I can do. 😉
The example you pointed at shows a pull request which shows the differences in a commit of a text file. That isn't a great feat since all source code is a contained text files and github nicely displays differenced in text files.
The desktop tool I've used for years to do the bulk of managing merge changes into the simh repo has no problem with changes to the simh doc files. An example from yesterday:
Meanwhile, your suggestion to submit a PR which transforms a doc/xxx.doc file into a doc/xxx.rst file, hardly would be expected to show meaningful differences.
Given the docs in .rst format, clearly "Joe a doc file author/editor" uses their favorite text editor to create and/or make changes to such a file.
Please answer the following: 1) How does "Joe a doc file author/editor" view the formatted results of the document they are creating or changing during the authoring process so that what they're writing ultimately looks good? 2) How does "Harry a simh user" read the .rst based documentation which comes with a packaged simh distribution?
The example you pointed at shows a pull request which shows the differences in a commit of a text file. That isn't a great feat since all source code is a contained text files and github nicely displays differenced in text files.
The "enhanced" diff view is optional. By default, it displays diffs like ordinary source code; the two modes can be toggled using the ‹›
and 🗌
buttons on the right:
How does "Joe a doc file author/editor" view the formatted results of the document they are creating or changing during the authoring process so that what they're writing ultimately looks good?
Just by viewing the document on GitHub.com, as you would any other file (example). The behaviour of the modes described above applies here to, albeit with different defaults.
How does "Harry a simh user" read the .rst based documentation which comes with a packaged simh distribution?
This is less straightforward to answer.
I'm assuming that a SIMH release will ship with documentation rendered in PDF, man(7)
and/or GNU Info formats. This approach is typical of most modern software projects. The source code is readable in plain-text format, so users with a source-code tarball will still be able to make sense of an .rst
file in whatever plain-text editor they're accustomed to.
How does "Joe a doc file author/editor" view the formatted results of the document they are creating or changing during the authoring process so that what they're writing ultimately looks good? Just by viewing the document on GitHub.com, as you would any other file (example). The behaviour of the modes described above applies here to, albeit with different defaults.
So, while I'm editing my document, I make a commit locally and push it to github every few minutes. This VERY FAR DISTANT from WYSWIG and seemingly a significant barrier to producing nicely formatted documents.
Any pull request on this subject that removes the .doc files needs to precisely describe how authoring of changes or new documents should be done.
Actually, .rst format can be converted, on your system, into a whole bunch of different formats including PDF and EPUB and others. As you said, using Github as a viewer is not a good option, but it isn't necessary or even a good recommendation. .md is more primitive but it too has some support for local WYSIWYG editing. Of the two, rst is the one I'd incline to.
This VERY FAR DISTANT from WYSWIG and seemingly a significant barrier to producing nicely formatted documents.
I'm not sure I understand the importance of WYSIWYG editing. You can generate "nicely formatted documents" from a plain-text (proper) document format, but you can't easily edit a WYSIWYG editor's format without a GUI (and one that supports the format in question).
I'm gauchely assuming that most SIMH users have some measure of technical proficiency, stipulating familiarity with a plain-text editor (Vim, Emacs, Nano, Sublime Text, VSCode, Atom, TextMate, et al). Restricting documentation to an inferior (WYSIWYG) format makes no sense from a sustainability standpoint, and I loathe having to fire up a window manager just to read SIMH manuals. I'm probably not alone in that department, either.
The choice between text editing and WYSIWYG is a personal preference question. There are many on each side, and many more on both sides. I'm quite comfortable with "markup" type document formats but nevertheless often use WYSIWYG tools if available. For example, I tracked one down for MD format and another for TeX for that reason. The key issue isn't that one, though. The key requirement is that contributors must be able to create AND CHECK their work locally. It seemed to be suggested that checking could be done by "send it to Github and look at the result there". That's not a valid answer in my view, but as I pointed out, it isn't correct anyway. What's needed is a document format that comes with good tools that allow creation, checking, and final form output generation on all the commonly used platforms. For example, while I wouldn't push TeX for our purposes, it meets those tests (markup style creation with any text editor, WYSIWYG with LyX, review and final output generation with TeX and its various output format generators) -- on Mac or Linux or Windows. I think the same holds for RST, with Sphinx, but haven't looked into that. Incidentally, this shows why DOC format is problematic; it's obsolete everywhere and the tools for Linux and Mac are often rather marginal.
This VERY FAR DISTANT from WYSWIG and seemingly a significant barrier to producing nicely formatted documents.
I'm not sure I understand the importance of WYSIWYG editing. You can generate "nicely formatted documents" from a plain-text (proper) document format, but you can't easily edit a WYSIWYG editor's format without a GUI (and one that supports the format in question).
I'm not suggesting that WYSIWYG is required, and this question is specifically NOT about how "users" would do something.
It is specifically about the create from scratch question and/or modify question. I can sit in this github issue/PR web interface and construct my comments in a discussion in "plain text" and as I'm doing this, I can click on the "Preview" tab to see how what I'm writing looks. Without something that simple, authoring and modifying documents is at least tedious.
When editing the https://github.com/simh/simh project README.md, I've got a Chrome extension called "Markdown Viewer", which will let me easily render the current document state merely by opening a file:// URL to the local file.
Does something like this exist for .rst?
Yes, Sphinx is a tool for processing RST into various output formats: https://www.sphinx-doc.org/en/master/
Does something like this exist for .rst?
Googling "reStructuredText GUI editor" yielded two handy-looking lists:
ReText looks promising, although I haven't tried it. There's also an online Sphinx editor which might be of interest to you.
It is specifically about the create from scratch question and/or modify question. I can sit in this github issue/PR web interface and construct my comments in a discussion in "plain text" and as I'm doing this, I can click on the "Preview" tab to see how what I'm writing looks. Without something that simple, authoring and modifying documents is at least tedious.
Convenience is a modest price to pay for robustness and flexibility.
I'm not looking for a GUI editor for this. I'm merely looking for an easy way to view changes being made to new or modified documents. Easy as the "Preview" tab in github issues, or a browser extension that lets you view the current state of the file as it evolves with a simple refresh of a file:// URL.
It is specifically about the create from scratch question and/or modify question. I can sit in this github issue/PR web interface and construct my comments in a discussion in "plain text" and as I'm doing this, I can click on the "Preview" tab to see how what I'm writing looks. Without something
Convenience is a modest price to pay for robustness and flexibility.
Your statement is a somewhat weak argument... 20 years of the use of the .doc format has been very robust and flexible for what it does.
We're talking past each other here since the only thing that's happened so far is this discussion. How hard is it to produce an equivalent form of the 2 suggested documents a week ago. How much effort is involved to convert those documents? Cut and paste should trivially move all of the text, so the only problem is dealing with the layout.
How hard is it to produce an equivalent form of the 2 suggested documents a week ago. How much effort is involved to convert those documents?
I haven't even started, and I'm in the middle of tying up 3-4 other (more urgent) things first. Could you be patient, please?
Also, just to reiterate: it's not simply a matter of copy-and-paste.
Your statement is a somewhat weak argument... 20 years of the use of the .doc format has been very robust and flexible for what it does.
And I'm honestly bewildered by how long it's lasted. Moreover, nothing about Microsoft Word is "robust" or "flexible"… 😉 In fact, those adjectives shouldn't even share a sentence with the program's name.
Visual Studio (and possibly Visual Studio Code) and many other editors have markdown and RST support built-in or available via plugins if you want a WYSIWYG experience.
I haven't even started
Right, now I've started. @markpizz, you can start your stopwatch now. 😝
Alright, here you go. Please enjoy:
Other documents I felt like porting that weren't requested:
These rst files look good viewing through a browser on github. A quick viewing of the simh_doc.rst, I see that the link to the LICENSE.txt file doesn't work. Also the switches described in section 4.20.1 (as well as other places) don't look good:
I come back to my update and maintenance question about this stuff. Specifically, please describe in detail your process to create these rst files. What tools you used and how you looked at the results before committing them.
It would seem that how this is done on various platforms would usefully be described in a README file in the doc directory.
As @LegalizeAdulthood suggested, I tried messing with Visual Studio Code on windows and adding the rst plugin/extension. When looking at these files, I see rst syntax highlights, but haven't found a way to view the results at least as nicely as the github views look...
I see that the link to the LICENSE.txt file doesn't work
Which one? Do you mean the one at the top?
It's working perfectly fine for me. It should link you to this.
Also the switches described in section 4.20.1 (as well as other places) don't look good:
Only on GitHub (and when viewed in a Chromium-based browser; the switches look fine in Firefox). This isn't the markup's fault, but rather the styling applied to it by GitHub. When viewed in Docutils's ugly default styling, the option-lists look fine:
reStructuredText has dedicated syntax for option-lists, which is what I'm using:
-d Avoid detaching and reattaching devices during a restore
-f Override the date timestamp check for attached files during a restore
-q Suppress warning messages about save file version information
If you prefer, I can use the more verbose table markup (below). This will fix the display on GitHub, but at the expense of code readability/maintainability:
====== =====================================================================
``-d`` Avoid detaching and reattaching devices during a restore
``-f`` Override the date timestamp check for attached files during a restore
``-q`` Suppress warning messages about save file version information
====== =====================================================================
Specifically, please describe in detail your process to create these rst files. What tools you used and how you looked at the results before committing them.
No tools, just an ordinary text-editor and knowledge of reStructuredText. Previews were generated locally using docutils
:
$ docutils doc/simh_doc.rst doc-output/simh_doc.html
It's really been a matter of copy+paste, nothing more.
Can you try publishing to the gh-pages
branch of your repository the HTML formatted output for viewing?
See Github Pages for reference
@LegalizeAdulthood Done. Generated with:
$ sphinx-build -C ./doc/ /tmp/gh-pages
$ cd $_
$ rm -rf static
$ mv _static static
$ perl -pi -E 's/_(?=static[\/"])//' *.html
$ grep -Irn . -e _static; # Sanity check
If you prefer, I can use the more verbose table markup (below). This will fix the display on GitHub, but at the expense of code readability/maintainability:
We would hope that many many more people will be reading the resulting emitted results than the very small handful of folks who will be writing or maintaining these documents. Given that goal, the readability of the formatted results on the most common platforms is most desirable. What's there now may look fine in Firefox, but Chrome, Edge, Safari all look the same as I saw previously on Chrome.
It's an unfortunate fact that a lot of HTML is produced by people who only care about a single browser type. It used to be that many webpages worked only with Internet Explorer; now it seems that "works only on Chrome" is the default. I'm forced to use Chrome for some websites because of this, something I don't want to do if I can help it.
If a documentation tool produces such defective HTML, that's a bug in the tool and should be reported to its authors so they can fix it. But until they do, such a broken tool can't be the way we generate documentation.
Specifically, please describe in detail your process to create these rst files. What tools you used and how you looked at the results before committing them.
No tools, just an ordinary text-editor and knowledge of reStructuredText. Previews were generated locally using docutils:
$ docutils doc/simh_doc.rst doc-output/simh_doc.html
That looks simple enough, BUT I come back to:
It would seem that how this is done on various platforms would usefully be described in a README file in the doc directory.
Such a README should be pretty specific about how to get started or how to install docutils... Notice that the simh documents provide specific guidance about how to get started and oddball steps to manually build the respective simulators.
The historic simh tool chain dependencies has been very minimal: Linux: gcc, and make and the local system package manager to install dependent packages MacOS: Xcode's Command Line Tools and either HomeBrew or MacPorts package managers to install dependent packages Windows: Visual Studio, optionally git for windows
Adding .rst files and producing various target document formats (html, pdf, etc.) increases the tool chain. How to do this needs to be described in reasonable detail.
It's an unfortunate fact that a lot of HTML is produced by people who only care about a single browser type. It used to be that many webpages worked only with Internet Explorer; now it seems that "works only on Chrome" is the default. I'm forced to use Chrome for some websites because of this, something I don't want to do if I can help it.
If a documentation tool produces such defective HTML, that's a bug in the tool and should be reported to its authors so they can fix it. But until they do, such a broken tool can't be the way we generate documentation.
The problem in this case is in github's dynamic transformation of .rst files when viewed directly in the repo.
It would really be nice if the .rst files on github could be referenced by most folks in browsers whenever they want rather than having a separate transformation step to produce html pages.
Given that goal, the readability of the formatted results on the most common platforms is most desirable.
Wilco. I'll amend the .rst
files then.
How to do this needs to be described in reasonable detail.
No problem, I'd be glad to add a README.md
file to the doc/
directory documenting how to regenerate the documentation files. That's something I'd prefer to leave until last, however, once everything else is done and in need of integrating into the rest of SIMH's build chain.
But until they do, such a broken tool can't be the way we generate documentation.
@pkoning2 This has nothing to do with the tools I'm using and everything to do with the CSS GitHub uses for styling <kbd>…</kbd>
elements (which are used for marking up switch-names in their description tables). The fix is literally one line of CSS:
kbd {
white-space: nowrap;
}
It's an unfortunate fact that a lot of HTML is produced by people who only care about a single browser type. It used to be that many webpages worked only with Internet Explorer; now it seems that "works only on Chrome"
That hasn't been true in almost a decade. It's also completely irrelevant to this discussion.
If GitHub formats our content but does so wrong, that's not a good thing. Admittedly we can just tell people "for the real content, install the kit and build the docs". For doc generation, a common approach is a build option (for example "make doc") that builds the doc output files assuming you have the necessary tools.
As for "not true and not relevant", I'll concede not relevant. But my observation is true today for certain websites; wsj.com is a big name example.
This has nothing to do with the tools I'm using and everything to do with the CSS GitHub uses for styling … elements (which are used for marking up switch-names in their description tables). The fix is literally one line of CSS: The fix is literally one line of CSS:
kbd { white-space: nowrap; }
@Alhadis, Since you've found this detail, wouldn't it be prudent to submit this as a change github might make, or would that break something else they need?
wouldn't it be prudent to submit this as a change github might make
I could open a new thread at github-community/community
, but it could be some time before a developer takes notice of the report and patches the site's codebase. Unfortunately—and ironically—there's no public repository where users can submit fixes as pull-requests. 🤷
or would that break something else they need?
The fix I suggested above would affect all <kbd>
elements site-wide, so it'd actually be better to limit it only to option-lists of rendered .rst
files. This step is far less trivial than updating a stylesheet though, and would involve hacking on github/markup
as well as reporting a CSS issue to github-community/community
.
For doc generation, a common approach is a build option (for example "make doc") that builds the doc output files assuming you have the necessary tools.
I can update the makefile with a docs
target. 👍 As for its output… what documentation formats (other than HTML) should we be generating? Info? PDF? Man pages?
For doc generation, a common approach is a build option (for example "make doc") that builds the doc output files assuming you have the necessary tools.
I can update the makefile with a docs target. 👍 As for its output… what documentation formats (other than HTML) should we be generating? Info? PDF? Man pages?
Please hold off on this until the doc/README exists and the required toolchain is understood.
I wasn't planning on updating the makefile now; I only wanted some indication of what sort of formats we need to accommodate (and by extension, what sort of programs I should be testing my work with).
A possible answer for "what output formats" could be found by looking at other projects that use RST. Python does, I think. I spoke favorably about RST because, as I recall, it supports not just the usual suspects (HTML, PDF, manpage) but also ebook formats (EPUB and perhaps MOBI). I like having my Python manuals on my ebook reader; having SIMH docs there would be an added bonus.
I've just finished porting simh.doc
, and I observed a few odd-looking naming discrepancies that appear to be typos. Since I'm unfamiliar with SIMH's codebase, I figure I should ask those more knowledgeable than I if these really are misspellings:
fparse_sym
a misspelling of parse_sym
?vm_fprint_stopped
is a mispelling of sim_vm_fprint_stopped
?I
, used once in (TMXR *) I
and stream I
, isn't mentioned anywhere else in the document.(You might need to CTRL+F to find the spellings in question)
As it turns out, I'm in the middle of converting these to Markdown as part of building a website for the project.
Markdown isn't quite as powerful as rst
, but it fits well into the GitHub pages environment and tools (jekyll) that I'm using.
It's also a simple text format that can easily be parsed to, for example, add/extract example scripts and help.
The website will also have PDF versions for off-line use.
Getting to the format requires some cleanup effort, which I'm still tweaking.
You can see a sample web rendering at http://opensimh.org/simdocs/i1620_doc and the corresponding PDF at http://opensimh.org/simdocs/i1620_doc.pdf
Can MD produce EPUB files? What about texinfo files? Manpages?
Where can we see the raw i1620_doc markdown file?
To a first order, yes. All of the following work and produce something recognizable:
pandoc simdocs/i1401_doc.md -o i1401_doc.epub
pandoc simdocs/i1401_doc.md --to man -o i1401_doc.man
pandoc simdocs/i1401_doc.md -o i1401_doc.texinfo
I think some metainfo needs to be added - like man section numbers, but that's a project for later.
If you want to get ahead of me on that, google is your friend...
Pandoc claims support for even more - here's the complete list. I haven't tried them all :-)
Where can we see the raw i1620_doc markdown file?
https://github.com/open-simh/.github/blob/main/docs/simdocs/i1620_doc.md
This is WIP, so there'll be some more cleanup.
You may want to use an editor/previewer, such as:
https://jbt.github.io/markdown-editor/
I run a slightly modified private (stable) copy on one of my servers. Access is available to contributors.
@tlhackque Please don't. I'm already more than halfway through a conversion myself, and markdown is a horrible format for technical documentation. Your version also lacks a lot of inline markup that I've added as part of the conversion process..
John -- conversion to what format?
@tlhackque Please don't. I'm already more than halfway through a conversion myself, and markdown is a horrible format for technical documentation. Your version also lacks a lot of inline markup that I've added as part of the conversion process..
@Alhadis I agree that Markdown isn't optimal, but it's workable. Your effort seemed to have died in July...I'm glad to hear that you're still working on it. Duplication isn't good.
So we need a plan.
Here's my thought process and status:
I'm well into setting up a website for the project. The documents need to end up there as well as in the simh code repository. But there is a lot of other content to be found/created/organized. Ultimately, it should be the one-stop information portal for the project.
My strategy has been to get everything up in a readable form as quickly as I can, and leave perfecting the document conversions (and other content) for another pass, if not another day. I think it's more important to be "good enough" soon than "much better", eventually.
Once things stabilize, the plan was to archive the .doc
files and replace them in the simh repository so that they're maintained in a portable format.
I picked Markdown because: it's unavoidable on GitHub : contributors use it for most READMEs, for issues, pull requests, and reviews. Also, it converts reasonably easily to many other formats. PDF is important so that documents can be distributed with simulators, and used off-line. Other output formats seem nice-to-have. One other consideration is that it be very easy to import and extract examples of how to use the simulators - both for creators and users. Getting text out in a format that supports simulator help mechanisms is also a consideration. It's hard enough to get documentation written once; twice is much harder.
It's important that the source format be easy for contributors to use without special tools - Word was a very, very bad choice. Being able to pull it into the website pages' templates (Jekyll, which is markdown-based) when viewed on-line would be nice for a common look & feel. But not essential.
Along those lines, it's also important that the docs continue to be maintained with the simulators, by the simulator developers in the SimH repo. Otherwise, they'll be instantly out of date. OTOH, they should appear under the opensimh domain when viewed from the website. Maybe a commit hook will push them to the website, or they'll end up in an iframe, or ... that's for later.
I'm sympathetic to the benefits of .rst
, but it would be another thing for contributors to learn. And in some sense, a hurdle. So we need to think about how we would persuade contributors to go there. And how to integrate it with the website. I believe there's at least an unofficial Jekyll plugin for RST. No idea how good it is, Nor how good RST to MD translators are. Or, if maintaining docs in md is better for developers, how faithful md->rst converters are.
From a project point of view, I want it to be easy to contribute - even at the expense of less flexible documents. Word -proprietary, and the .doc obsolete - clearly isn't where we should be. I'm not sure how to resolve this. Neither two formats, nor endless conversions seem attractive.
My conversions are improving (about 15 are readable, the others placeholders). The web versions are OK; the PDFs need work - eventually. It's not a straight conversion: I've been doing some edits for consistency and readability along the way. But it's by no means my favorite activity nor my primary expertise.
Going forward, if we can come up with a unified plan, I'd be happy to leave the documents to you - what's your timeline?
Meantime, I guess I'll put my work on hold for now.
John -- conversion to what format?
He's converting to rst.
Going forward, if we can come up with a unified plan, I'd be happy to leave the documents to you - what's your timeline?
I'll see if I can have a finished, working pipeline by December. This includes a readme file in the docs
directory explaining to contributors how to use reStructuredText; i.e., programs to install, what command they need to update the docs after changes, etc. Said document will also include links to RST documentation.
So we need to think about how we would persuade contributors to go there. And how to integrate it with the website. I believe there's at least an unofficial Jekyll plugin for RST.
I've 0 experience with Jekyll, but AFAIK, GitHub Pages should accept static HTML pages as input. A webhook can be added that keeps the site in-sync with upstream. But an intermediate conversion to markdown really shouldn't be necessary.
Your effort seemed to have died in July...I'm glad to hear that you're still working on it. Duplication isn't good.
Actually, I was halfway through converting a huge-ass document, and was reluctant to commit incomplete progress (which I ended up doing anyway as part of a backup, since my MacBook had to go in for repairs). Anyway, now that there's pressure on me to finish this, I'll prioritise this above my other projects…
This is something I've been itching to do for years: replace those ugly binary files with files written in a lightweight markup language like reStructuredText or AsciiDoc (both of which are rendered like Markdown on GitHub).
This wouldn't be an automated conversation, as tools like Pandoc have (at least in my experiments) discarded structurally relevant info that I assume is represented internally in the
.doc
files in some purely-presentational fashion.Aside from the obvious benefits of diffing and maintainability, it also permits nifty features GitHub adds to its rendered markup, like section navigation menus and even diagrams.
I'm happy to do the heavy lifting if the answer to the above is an affirmative.