mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
811 stars 121 forks source link

Add support for page breaks #23

Closed GitBruno closed 7 years ago

GitBruno commented 7 years ago

What about this?

mwilliamson commented 7 years ago

Thanks for taking a look. I'm not sure the behaviour matches what we discussed in the issue, specifically keeping the default behaviour the same and allowing customisation through style maps. Also, each of the changes should be covered by tests.

GitBruno commented 7 years ago

OK, let me have another go. I am learning on the go :)

mwilliamson commented 7 years ago

As a warning, the way style maps are parsed is probably about to change -- the changes are currently in the parser branch.

mwilliamson commented 7 years ago

Just to let you know: both the JavaScript and Java implementations now have support for other break types, so ideally the Python code should work in the same way.

mwilliamson commented 7 years ago

It seems that the document_matchers.py and document_matcher_reader.py is working and implemented properly. But unsure if I have implemented conversion.py right, as a custom stylemap does not seem to work. There might be something I have missed?

document_matcher_reader.py was removed as part of the parser updates, so shouldn't even exist. You'll need to change mammoth/styles/parser/document_matcher_parser.py instead.

mwilliamson commented 7 years ago

I'd like to get this merged in this weekend if possible. I think the main thing left is the support for the style mappings for breaks. There's also a few stylistic and code structure things, which I can either leave comments for or fix myself, depending on your preference.

GitBruno commented 7 years ago

Either way is cool, but I can't do it this weekend. If you go ahead I'm looking forward checking out the structural and stylistic thing. I found that the Break naming was really hard to do nicely!

mwilliamson commented 7 years ago

I made the relevant changes in the break branch, and squashed them onto master. Thanks again for taking a look at this!