scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.37k stars 513 forks source link

Discussion on font #765

Open NathanTech7713 opened 2 years ago

NathanTech7713 commented 2 years ago

hi there,

First a quick sorry, I wanted to start a discussion rather than a new issue, but I don't know how to do that on github, because i am #newbie.

Anyways, I was working with a presentation and ran the code:

prs=Presentation("my.pptx") slide=prs.slides[0] shape=slide.shapes[0] frame=shape.text_frame pa=frame.paragraphs[0] ru=pa.runs[0] ru.font.size

This returns None. I googled around and @scanny mentioned it's because this comes from the parent, but I followed it up as far as text_frame but font.size == None.

Again more research reveals this is because it comes from the parent styles of the presentation, which is not supported? Is this stored in the zip file of the pptx file and simply not yet processed by the program? Or is this embbedded in PPTX and thus unavailable. This being the case is it worth having a filler object, like pptx.NULLFONT which works as a filler or some such? I suppose None works just as well here.

Thoughts are appreciated.

scanny commented 2 years ago

I usually call the topic you're asking about effective font. The "effective" notion applies to many other characteristics as well, like fill and line style, etc. So you may be able to find other threads on this with a search phrase like python-pptx effective font.

This comes up fairly frequently. The thing is that an implementation would be complicated. The basic idea is that formatting characteristics like font are defined in a cascade similar in concept to cascading style sheets (CSS) for HTML. This allows the user full formatting flexibility while minimizing the amount of specification the user has to perform.

The result is that there are several places/scopes at which e.g. font can be specified and the effective font is found be working upwards from a direct font specification (on a run in the font case). The cascade or hierarchy of scopes is not documented, and can change from characteristic to characteristic. There are probably approaching ten possible scopes, including inside a table in certain places and text within placeholders.

The final authority is a PPTX built-in font choice, but almost always that would be overridden by a presentation-wide setting somewhere. Neither of those currently have API support, mostly because setting those is easily accomplished by creating a custom starting template .pptx file.

Anyway, not sure if that answers your question. On the NULLFONT question, these values are optional, and None is typically used to indicate the absence of an optional value, so I like that existing choice fine the way it is.

NathanTech7713 commented 2 years ago

Heya.

On the None v.s. null point, I admit after I wrote it I did read back and think actually... I think he's onto something there with what you already have.

Regarding the font thing you've made it a lot clearer for me, so thanks for that. I've been considering the issue because I wanted to make a small program to speed up accessibility for large print users. At the moment the process is simple enough, highlight your text, go to the font menu and blow it up to font 18 or 26 or what ever the required reading font is. This is fine right up until you have 80 slides, or even 40. Lets be honest by 20 slides you want a new job right?

Based off of what you say above, and perhaps as a hint to other folk as well, is it worth setting a baseline within my program then so that, for instance,: if(run.font.size==None): font=12 else: font=run.font.size

but that 12 is a number the user themselves can customise, so if their entire PPTX is already size 18 for instance, python pptx would probably not pick up on that, but by them setting the font variable themselves, it effectively sidesteps that issue.

I just wanted to float this out there because 1 it might be useful to someone in the future and 2 I wondered if you might be able to notice a place I've gone wrong? EG if there are occasions where a heading could be for instance font 18, while the rest is 12 but still not show in the python package.

scanny commented 2 years ago

I think we have to start with a conversation about the desired behaviors. The big problem I see is that PowerPoint is essentially a page-layout tool, which is very different from a flowed-text environment like a Word document or a web-page.

In a flowed-text environment, one can just make text bigger and the document will become longer, perhaps breaking pages differently, but otherwise working just fine. In a page-layout environment, you have a fixed amount of space and need to place text, shapes, and images within those specific, limited boundaries in a pleasing way, in particular, such that the text doesn't overrun the page extents. This "fitting" is generally handled by the slide author using intelligence and intuition a computer is not going to have.

So the first question would be, what if the text runs off the bottom of the slide? Is that okay because they're reading it in "edit" mode in the PowerPoint app? Or if printing or PDFing cuts off half the slide content, is that going to suit?

MartinPacker commented 2 years ago

On the "running off the bottom of the slide" point - I don't see how we can programmatically manage that - shrinking the font size to fit. Unless somehow we can generate some methods that understand the current font's metrics.

(I have a variant of this problem with scaling a fixed pitch font to fit a text box's width - for code snippets in md2pptx. But at least there the clue is in the name "fixed pitch".) :-)

NathanTech7713 commented 2 years ago

Ah, so if I am understanding this right (and bare in mind I'm approaching from a blind perspective here so bare with me) it is actually possible for text to not appear? that's to say if I write enough, or blow it up enough to a large enough font, the text is irretrievably hidden beyond the page. Surely you could scroll it?

Relatedly therefore, in terms of my original idea of enlargement for visual impairments that require bigger fonts, the approach might be better to just yank the text out of the powerpoint into a separate word document? Though this could lose continuity in terms of if there are other things embedded such as images or charts.

MartinPacker commented 2 years ago

That's right: Out of the text box, off the end of the slide, over the hills and far away. :-)

The text is, of course, still in the text area object but visually I wouldn't see it. Whether a screen reader would is doubtful, too.

But the text can be programmatically extracted and then read out to you. Or placed in another application or document.

(I think you have an even worse problem with pictures.)

scanny commented 2 years ago

Well, I wouldn't say irretrievably hidden beyond the page. In general, while in edit mode, when the content overruns slide boundaries, the slide can be scrolled back and forth and up and down to see all the text. The question would be about convenience. One behavior I find particularly inconvenient is that if you scroll just a little too far then the slide moves to the next one, and when you page back to this one you've lost your scrolling position.

Personally I'd be inclined to identify some beta testers and some example content, do different enlargement options by hand, and see how they respond to each. What the users think will be the final arbiter of success, so it reduces risk to get feedback from them as soon as possible.

MartinPacker commented 2 years ago

@scanny they mentioned they're blind - so that might be difficult.