python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.57k stars 1.12k forks source link

RTL attribute disables font name! #430

Open pooryakhajouie opened 7 years ago

pooryakhajouie commented 7 years ago

I create a style to use it for my paragraphs. when I change the RTL attribute to TRUE, the text is not written with the font I've specified and when it sets FALSE, the code works properly and font name is correct. the text is a mix of Persian and English. my style is like this:

word_document = docx.Document()

style_rtl = word_document.styles.add_style('NormalRTL', WD_STYLE_TYPE.PARAGRAPH) style_rtl.font.name = 'Noto Naskh Arabic' style_rtl.font.rtl = True

paragraph = word_document.add_paragraph('.سلام new آخر') paragraph.style = style_rtl

I try several ways to solve this but still no answer. anyone knows what's the problem?

scanny commented 7 years ago

Better have a look at the XML produced by python-docx and compare it to that produced by Word when it's doing what you want. opc-diag is handy for that job. The XML in question will be in the document.xml part. A super-short test document makes this a lot easier.

I wouldn't be surprised if it had something to do with specifying a cursive font. Those elements usually start with 'cs', like <w:csBold/>.

There is some background information on fonts in the documentation here: http://python-docx.readthedocs.io/en/latest/dev/analysis/features/text/font.html

This is in the pre-development analysis section, so not everything you see there is necessarily implemented.

pooryakhajouie commented 7 years ago

I've changed the font using for my document to this font but no difference in result.

NotoNaskhArabic

I've updated the first post and put the complete test code. I also upload the output document here. Would you please check what's wrong with it?

new.docx

yuvalhuck commented 8 months ago

The bug still exists:

from docx import Document from docx.shared import Pt

doc = Document()

def set_font(run, font_name="David", size=14): run.font.name = font_name run.font.size = Pt(size) run.font.rtl = True

paragraph = doc.add_paragraph() run = paragraph.add_run("טקסט בעברית.") set_font(run, "David", 14)

doc.add_page_break() doc.save('example.docx')

cheshirex commented 2 weeks ago

I've been playing around with this issue a bit on my own. Opening up the document.xml inside the zip, what I see is that, when <w:rtl/> is present, the other properties of the font are ignored. For example, a document created would have run properties: <w:rPr><w:rFonts w:ascii="Arial" w:hAnsi="Arial"/><w:rtl/></w:rPr>

From a manually-created document, I see that the RTL text actually has run properties: <w:rPr><w:rFonts w:ascii="Aharoni" w:hAnsi="Aharoni" w:cs="Aharoni" w:hint="cs"/><w:lang w:val="en-US"/></w:rPr>

If I manually edit the document.xml above and add w:cs and w:hint as in the manual document, it looks like it actually keeps the properties we're trying to set: <w:rPr><w:rFonts w:ascii="Aharoni" w:hAnsi="Aharoni" w:cs="Aharoni" w:hint="cs"/><w:rtl/></w:rPr>

I do not understand enough about the specific attributes in the Word XML format to really understand what I've done here, but perhaps this can move us forward? This would also affect #973 and #510 , I think.

scanny commented 2 weeks ago

cs in this context stands for "complex script" I believe, so that might be something to search on.

See section 17.3.2.7 of ISO 29500-1 for a start https://github.com/python-openxml/python-docx/blob/master/ref/ISO-IEC-29500-1.pdf

Section 17.3.2.26 has some discussion of w:hint toward the end of that section.