mwilliamson / mammoth.js

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
4.77k stars 513 forks source link

Access color and background/highlight color through transformDocument #292

Open walling opened 2 years ago

walling commented 2 years ago

I have a document with a table, where the background color of table cells as well as text color and highlighted text color have some meaning for the viewer. Basically a big time table, where colors represent various categories. This could be a category for the whole weekend (table cell) or for a specific person during the weekend (highlighted text) or a specific task to be done (text color). Unfortunately, I don't have influence over the business process of the people updating this document. I need to write a small automated tool to make this document accessible to a blind user using a screen reader, so everything needs to be marked up semantically or explained through text.

It would be nice with a small example how to access the color information through the transformDocument function, so that I can output it (semantically) in the generated HTML code.

Is this possible? I played around with the API a bit, but it seems that only a few style properties are accessible, colors not being one of them. I imagine that something like this should be possible to implement:

function transformParagraph(paragraph) {
    console.log(paragraph.children[0].color); // => '#ff0000'
    console.log(paragraph.children[0].highlightedColor); // => null (if not specified)
}

If this is out-of-scope directly, maybe it would help to add a method on the individual elements to access the DOM state somehow. I imagine something like this:

function transformParagraph(paragraph) {
    // `dom()` is helper method to access the XML DOM behind this paragraph. Not sure if this is 100% correct :-)
    console.log(paragraph.dom().firstOrEmpty("w:color").attributes["w:val"]); // => '#ff0000'
}

Any feedback is appreciated.

mwilliamson commented 2 years ago

As you say, I don't think Mammoth currently expose colours, which I would expect would be on the runs (although I haven't checked). Some properties have been added purely for use in document transforms, such as the font, so I wouldn't be opposed to doing the same for colours.

Allowing direct access to the underlying XML is something that's come up before, but I've never gotten around to dealing with. One of the issues is that the XML representation that Mammoth uses is unique to Mammoth, so I'd be reluctant to expose it (although all of the data structures exposed by document transforms are marked as unstable anyway).

Also, while I remember, if you're looking for colours on runs, you'd probably want to use mammoth.transforms.getDescendantsOfType(paragraph, "run") rather than assuming paragraph.children[0] is a run.

DugarRishab commented 1 week ago

Hey, just wanted to know if there are plans for mammoth to include colors and background colors in the future.

mwilliamson commented 1 week ago

No plans at present.

DugarRishab commented 1 week ago

@mwilliamson then can I work on this? Can you guide me as to how I can add color support? From what I understand this is an issue in XML to JSON conversion. Is there any particular reason why this was not implemented earlier? I am asking so I can get a better understanding of the issue.

Any information you can provide me about this will be helpful. I want to implement this as this will be useful in my project.

mwilliamson commented 1 week ago

then can I work on this?

I'm afraid I'm not currently accepting pull requests for Mammoth.

Is there any particular reason why this was not implemented earlier?

No particular technical reason, mostly a lack of time and that I've prioritised other functionality.