seriesseed / equity

Series Seed Preferred Stock
http://www.seriesseed.com
1.23k stars 335 forks source link

Styling in Word versions of documents #24

Open ghost opened 10 years ago

ghost commented 10 years ago

I noticed that the Microsoft Word versions of these documents have room for improvement with respect to styling. Is it possible to crowdsource the Word files themselves, or are we only doing substantive text? Thanks!

kemitchell commented 10 years ago

The trouble with tracking .docx is that Git will treat them as binary files. Neither Git nor GitHub can diff or auto-merge Word files.

As for styling, there's wide variation in how lawyers use (and don't use) Word styles, cross-references, automatic numbering, etc. I've done a little work on programmatically outputting .docx contracts using the whole kitchen sink of Word features (https://github.com/CommonForm/commonform-docx), but have it on my to-do list to revert back to using .docx like RTF, without styles or fields.

If you're looking for a way to do Markdown to docx today, have a look at Pandoc.

ghost commented 10 years ago

Thanks for the speedy reply Kyle. .docx files are zipped collections of XML text files. In theory, the maintainer could "compile" the XML source code by zipping them and renaming the zip file for publishing in .docx form. But if the project is interested in the substance of the words only, I get that... was just curious.

kemitchell commented 10 years ago

Unfortunately, diff'ing document.xml files isn't much of an improvement on binary diff, given the amount of cruft. Word does strange things with ranges and paragraphs in those files, up to and including equivalent looking text in different ways.

If you're interested in working on a programmatic way to output pretty .docx from plain text or structured data contract descriptions, let's definitely chat. It's on my list for another project, where I'm trying to handle 100% in-browser.

jrmiller82 commented 10 years ago

You guys should really switch over to something like Markup or LaTeX or Org mode in plain text so the diffs are super easy still; but, you can then output very pretty documents.

kemitchell commented 10 years ago

The files are currently in Markdown. A number of packages, including Pandoc, can convert to LaTeX and other formats.

jrmiller82 commented 10 years ago

Oops. My bad. Sorry.

ghost commented 10 years ago

@kemitchell I skimmed through the Markdown syntax. It does not appear to support inline semantic markup (e.g., marking defined terms for styling). Am I understanding this right or is there some way in Markdown to duplicate something like HTML's span element approach.

kemitchell commented 10 years ago

@joejarvis Markdown is hard to generalize. Despite some efforts in the direction of standardization (http://commonmark.org/), implementations vary widely. Most all support inline HTML, which would get you , but at the price of turning the whole containing paragraph into an HTML literal where Markdown-style underscores and asterisks no longer apply. I'm sure there are some "extended" Markdown flavors with inline element styles, but these will be idiosyncratic, and probably lock you to a particular implementation.

jrmiller82 commented 10 years ago

I prefer org or latex personally. More precise formatting choices.

Sent from

On Oct 31, 2014, at 11:24 AM, Kyle Mitchell notifications@github.com wrote:

@joejarvis Markdown is hard to generalize. Despite some efforts in the direction of standardization (http://commonmark.org/), implementations vary widely. Most all support inline HTML, which would get you , but at the price of turning the whole containing paragraph into an HTML literal where Markdown-style underscores and asterisks no longer apply. I'm sure there are some "extended" Markdown flavors with inline element styles, but these will be idiosyncratic, and probably lock you to a particular implementation.

— Reply to this email directly or view it on GitHub.

ghost commented 10 years ago

Thanks @kemitchell. Looking over the inline section of the commonmark spec, I can see that the only "official" option for inline semantic markup is reverting to HTML, which would defeat the purpose of using Markdown. Bummer. I'll take a look at Pandoc when I have time.

goodcounsel commented 9 years ago

Putting aside the technical issues of doing .docx in GitHub, which I know nothing about, it is clear that the styling of the final documents is terribly bloated and inconsistent. The documents should simply be copied and pasted into clean Word docs, and the styling simplified. There's no reason to have nearly 50 different styles in this document, which really just has a few levels of numbered headings, and assorted others. And for all of these styles, the Article heading (Roman I, II, III, etc.) in the Certificate and the second level after (A, B, C, etc.) are not even auto-numbered! Maybe no one things anyone is going to edit these and it's therefore not necessary, but I (and I am sure others) do sometimes modify the base provisions.

kemitchell commented 9 years ago

@goodcounsel, alas, neither GitHub nor the Git tool that underlies it is well suited to .docx comparison, in part because even .docx files that seem simple in Word are nightmarish data junk-drawers "under the hood." Auto-numbering, in particular, is black magic, as evidenced by the problems even very expensive Word document comparison tools often have with it. Markdown, on the other hand, is very well supported on GitHub, and makes reviewing and editing via web browser about as straightforward as it can be.

When you mention fifty Word styles, are you referring to the .docx files from seriesseed.com in Microsoft Word? I see only standard styles in the version 3.2 clean copies in most-recent Word on Windows 7.

@jboehmig, are the .docx files on seriesseed.com generated from the .md automatically? If not, I can make a to-do item to PR a build system using pandoc, which does sane Markdown-to-Word conversion, and Travis CI to do the conversion and build a GitHub "release" of each new tagged commit automatically.

goodcounsel commented 9 years ago

Yes, right from the website. Here are some screenshots showing "styles in use" from the Word Style Organizer. There is some crazy explosion of styles.

screen shot 2015-05-01 at 11 28 59 am screen shot 2015-05-01 at 11 28 49 am screen shot 2015-05-01 at 11 28 41 am screen shot 2015-05-01 at 11 28 28 am screen shot 2015-05-01 at 11 28 19 am

kemitchell commented 9 years ago

@goodcounsel: I take it those styles are the calling card of Fenwick's house Word template or numbering plug-in. I will follow up with @jboehmig about an automated process for generating clean .docx.