Open pooryakhajouie opened 7 years ago
Better have a look at the XML produced by python-docx
and compare it to that produced by Word when it's doing what you want. opc-diag
is handy for that job. The XML in question will be in the document.xml
part. A super-short test document makes this a lot easier.
I wouldn't be surprised if it had something to do with specifying a cursive font. Those elements usually start with 'cs', like <w:csBold/>
.
There is some background information on fonts in the documentation here: http://python-docx.readthedocs.io/en/latest/dev/analysis/features/text/font.html
This is in the pre-development analysis section, so not everything you see there is necessarily implemented.
I've changed the font using for my document to this font but no difference in result.
I've updated the first post and put the complete test code. I also upload the output document here. Would you please check what's wrong with it?
The bug still exists:
from docx import Document from docx.shared import Pt
doc = Document()
def set_font(run, font_name="David", size=14): run.font.name = font_name run.font.size = Pt(size) run.font.rtl = True
paragraph = doc.add_paragraph() run = paragraph.add_run("טקסט בעברית.") set_font(run, "David", 14)
doc.add_page_break() doc.save('example.docx')
I've been playing around with this issue a bit on my own. Opening up the document.xml inside the zip, what I see is that, when <w:rtl/>
is present, the other properties of the font are ignored. For example, a document created would have run properties:
<w:rPr><w:rFonts w:ascii="Arial" w:hAnsi="Arial"/><w:rtl/></w:rPr>
From a manually-created document, I see that the RTL text actually has run properties:
<w:rPr><w:rFonts w:ascii="Aharoni" w:hAnsi="Aharoni" w:cs="Aharoni" w:hint="cs"/><w:lang w:val="en-US"/></w:rPr>
If I manually edit the document.xml above and add w:cs
and w:hint
as in the manual document, it looks like it actually keeps the properties we're trying to set:
<w:rPr><w:rFonts w:ascii="Aharoni" w:hAnsi="Aharoni" w:cs="Aharoni" w:hint="cs"/><w:rtl/></w:rPr>
I do not understand enough about the specific attributes in the Word XML format to really understand what I've done here, but perhaps this can move us forward? This would also affect #973 and #510 , I think.
cs
in this context stands for "complex script" I believe, so that might be something to search on.
See section 17.3.2.7 of ISO 29500-1 for a start https://github.com/python-openxml/python-docx/blob/master/ref/ISO-IEC-29500-1.pdf
Section 17.3.2.26 has some discussion of w:hint
toward the end of that section.
I create a style to use it for my paragraphs. when I change the RTL attribute to TRUE, the text is not written with the font I've specified and when it sets FALSE, the code works properly and font name is correct. the text is a mix of Persian and English. my style is like this:
word_document = docx.Document()
style_rtl = word_document.styles.add_style('NormalRTL', WD_STYLE_TYPE.PARAGRAPH)
style_rtl.font.name = 'Noto Naskh Arabic'
style_rtl.font.rtl = True
paragraph = word_document.add_paragraph('.سلام new آخر')
paragraph.style = style_rtl
I try several ways to solve this but still no answer. anyone knows what's the problem?