scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.45k stars 528 forks source link

changing name of slide #671

Open WolfgangFahl opened 3 years ago

WolfgangFahl commented 3 years ago

https://stackoverflow.com/questions/16855306/powerpoint-manually-set-slide-name shows how awkward it is to try to modify a slides name in Powerpoint or VBA it would be great if it would be possible to do it with the python-pptx library.

I tried modify slide.name and saving it but that didn't seem to change the slide name. When reading in the presentation again the slide name was still the old one. Am i missing something ?

scanny commented 3 years ago

What displayed string in the PowerPoint UI are you figuring for the slide-name?

Guessing, I would imagine the name that appears when viewing slides in the slide sorter or outline mode. I suspect part of the problem is that PowerPoint does its best to display a meaningful name (in particular the slide title) without calling on users to actually name the slides, unless they want something different from the title.

The thing to do is compare the before and after renaming XML and see where PowerPoint lodges (and later retrieves for display) whatever "custom" name you give to a slide. I'm betting it's not in <p:cSld "name"="xyz"/> which is the XML node one might reasonably expect is the "official" slide name.

WolfgangFahl commented 3 years ago

@scanny thanks for looking into this. I come from the Apache POI library where the same discusson was done a few years ago and the functionality can be implemented as outlined in https://stackoverflow.com/questions/44174371/how-to-retrieve-pptx-slide-name-with-apache-poi.

// set slide name via POI and validate it
    sl.getXmlObject().getCSld().setName("new name");

This made it into the Java library via https://svn.apache.org/viewvc?view=revision&revision=1831745

timcolson commented 3 years ago

TIL that slides can actually have unique identifiers, which would be useful to track a slide even if it moves in the deck! I'm curious, @WolfgangFahl are you describing this as a "bug" since slide.name did not persist, or is it more of a "feature request" b/c slide.name does not actually exist as a property (yet) in python-pptx?

scanny commented 3 years ago

Hey @timcolson, long time no see :) Slide.name is a read-write property, so that's not a problem as far as I know. The idea is that what appears in the PowerPoint UI as the "name" of the slide is not the value of that property. As I recall, by default it is the contents of the slide-title, which makes sense because folks mostly define titles, but asking them to define a name would likely be interpreted as an unneccesary pain for users.

Btw, there is a Slide.slide_id property that allows discovery of the unique slide-id for each slide. That property is not writeable since it's value is arbitrary and guaranteed unique. You could theoretically change it, but any supplied value would have to be verified for uniqueness and isn't something we've seen a use case for.

timcolson commented 3 years ago

Thanks for the details, @scanny. I have a use case where tracking the location of particular slides, wherever they may be moved in a deck, will be necessary. My first idea was to programmatically add a GUID metadata fingerprint in the speaker notes, but the slide "name" sounded like a good option.

After hearing about the slide_id, that may work better!

No intent to mutate the ID, nor have users even see it. Just want to be able to scan a deck and tell the user, "The slide you're looking for is #N" - where N might be 37 yesterday, but after modifications and new slides, it's #42 now.

No "Develop" menu on MacOS PPT OOTB, so loading Ofc now on my Win machine. I'm still curious about the "name" field, so will give it a test to see what changes in the XML. :)

WolfgangFahl commented 3 years ago

The slide.name actually exists in the xml spec for powerpoint. Even Microsoft does not provide a UI for it. It is essential if you e.g. try to keep multi-language versions of Powerpoint presentations around as we do. The titles of the slide (and unfortunately even the pages) of the slides might differ but they should have the same content just in another language.

timcolson commented 3 years ago

TL;DR: Looks like PPT slide name is indeed stored in element. 😄

Today I Learned

My Process

Steps to recreate and determine the field used for slide name.

Create and Compare 1-Slide Preso

  1. Create a single-slide PPTX file with one slide, title: "TIM TITLE1", filename: "step1.pptx"
  2. Copy step1.pptx to step2.pptx, change title to "TIM TITLE2"
  3. Cleanup Slide XML
    • Unzip step1.pptx and step2.pptx
    • xmllint --format stepN/ppt/slides/slide1.xml > sN.xml // adds newlines
  4. diff of s1.xml and s2.xml; shows only slide element changes.
    41c41
    <               <a:t>TIM TITLE1</a:t>
    ---
    >               <a:t>TIM TITLE2</a:t>

Use GUI to add "slide name" metadata

  1. Copy step1.pptx to step-name.pptx
  2. Open step-name.pptx in Win10 PowerPoint w/ Developer enabled
  3. Select Slide 1 & temporarily add Label ActiveX control (to enable Properties button)
  4. Open "Properties"; see Label control; select Slide in hierarchy; see (Name) property
  5. Set (Name) to "Slide1-TC"
  6. Close Properties; delete Label control; save without adding macros.
  7. Cleanup Slide XML (see above)
  8. diff of s1.xml and s-named.xml shows : <p:cSld name="Slide1-TC">

Manually set a slide name

  1. Copy step1.pptx to step-name-manual.pptx
  2. Unzip; edit ppt/slides/slide1.xml
    • add name="S1-TC" attribute to p:cSld element
    • modify a:t title to "TIM TITLE Manual Named"
  3. Zip to pptx; Open in PowerPoint
  4. Add temporary Label ActiveX control; open Developer Properties, select slide, verify that slide name changed!
> diff s1.xml s-named-manual.xml
3c3
<   <p:cSld>
---
>   <p:cSld name="S1-TC">
41c41
<               <a:t>TIM TITLE1</a:t>
---
>               <a:t>TIM TITLE Manual Named</a:t>
scanny commented 3 years ago

@timcolson you can accomplish changing Slide.name easily from python-pptx:

slide.name = "Slide1-TC"

I think the problem @WolfgangFahl was pointing at was that doing so doesn't make that new name appear in PowerPoint outline mode or the other UI places where slides appear "by name".

But if you just want a programmatic identifier for a slide, albeit not guaranteed unique, then Slide.name is a good option.

timcolson commented 3 years ago

Thx, Mr. @scanny. Obv new to this, still getting my bearings. If I'm now understanding correctly, will python-pptx setting slide.name="Slide1-TC" result in ?

WolfgangFahl commented 3 years ago

Please implement this as the apache poi library does. I think i am currently using a workaround but i am not sure since the discussion has been going on for so long already. I am definitely still working with slide names a lot.

timcolson commented 3 years ago
2021-02-12 Python-pptx slide name

I can confirm python-pptx definitely sets slide name, just as Steve said. (I never doubted! 👍 ) ppt.slides[1].name="Slide1-TC" results in <p:cSld name="Slide1-TC"> in the saved PPTX

After reading the thread again, I realized I misunderstood the phrase, "I'm betting it's not in ".

I wrongly thought that meant this particular name data was not expected for the name attribute. I now believe this was intended as a troubleshooting suggestion for @WolfgangFahl to verify name data was actually written to the file. (Unzipping and viewing the attribute in the XML would confirm.)

FWIW

Great learning experience for me. Also learned VS Code has an interactive Python Jyupter notebook capability. Cool!

Two more observations: 1) Retrieving slide_id as int was easy with python-PPTX, just as Steve said, but I was unsuccessful with Apache POI to retrieve the same. It exists, but private to the class. Perhaps the ID is considered an internal implementation detail by POI, so not exposed.

2) Differences in treatment of slides where name is not set:

 @Override
    public String getSlideName() {
        final CTCommonSlideData cSld = getXmlObject().getCSld();
        return cSld.isSetName() ? cSld.getName() : "Slide"+getSlideNumber();
    }

@Override
    public int getSlideNumber() {
        int idx = getSlideShow().getSlides().indexOf(this);
        return (idx == -1) ? idx : idx+1;
    }

// Tim: no setSlideName() -- instead, must directly update XML object, like this:
// slide.getXmlObject().getCSld().setName("TC-JavaName");

Summary

Setting slide name does work in python-ppt, and is even nicer than the POI code.

I'm curious, is there still a need to make changes, Herr @WolfgangFahl?

WolfgangFahl commented 3 years ago

After reporting the issue here i worked around the problem by using my old PowerPoint VBA macros for changing the slide names. I never retried to use the library in write mode. Reading the slide.name is and was no problem.

pip show python-pptx
Name: python-pptx
Version: 0.6.18
Summary: Generate and manipulate Open XML PowerPoint (.pptx) files
Home-page: http://github.com/scanny/python-pptx
Author: Steve Canny
Author-email: python-pptx@googlegroups.com
License: The MIT License (MIT)
Location: /Users/wf/Library/Python/3.8/lib/python/site-packages
Requires: lxml, Pillow, XlsxWriter
Required-by: 

This seems to be in sync with https://pypi.org/project/python-pptx/ as of 2021-02-13.

Next time i am working on my code again i might look into the issue again. If i remember right it was not the only problem with the library - i believe some of my power point files were not re-saved correctly in other ways so i didn't dare to manipulate the files with the library but used powerpoints vba for this.

timcolson commented 3 years ago

I verified slide.name read/write work as expected, I suggest @scanny close this issue.

Delengowski commented 3 years ago

Now I am curious, I've been aware that cSId(name=) attributes are not guaranteed to be unique but then what does the winCOM API do when you index by name, which is completely allowed?

https://docs.microsoft.com/en-us/office/vba/api/powerpoint.slides.item

Interestingly, winCOM will throw an error if you do not keep the slide names unique when setting them.

In [1]: import win32com.client as win32

In [2]: pptx = win32.gencache.EnsureDispatch("PowerPoint.Application")

In [3]: pres = pptx.Presentations.Add(False)

In [4]: layout = pres.Designs(1).SlideMaster.CustomLayouts(1)

In [5]: pres.Slides.AddSlide(1, layout)
Out[5]: <win32com.gen_py.None.Slide>

In [6]: pres.Slides.AddSlide(2, layout)
Out[6]: <win32com.gen_py.None.Slide>

In [7]: pres.Slides(1).Name = "Test"

In [8]: pres.Slides(2).Name = "Test"
---------------------------------------------------------------------------
com_error                                 Traceback (most recent call last)
<ipython-input-8-410b453c78da> in <module>
----> 1 pres.Slides(2).Name = "Test"

C:\ProgramData\Anaconda3\lib\site-packages\win32com\client\__init__.py in __setattr__(self, attr, value)
    518                         d=self.__dict__["_dispobj_"]
    519                         if d is not None:
--> 520                                 d.__setattr__(attr, value)
    521                                 return
    522                 except AttributeError:

C:\ProgramData\Anaconda3\lib\site-packages\win32com\client\__init__.py in __setattr__(self, attr, value)
    480                 except KeyError:
    481                         raise AttributeError("'%s' object has no attribute '%s'" % (repr(self), attr))
--> 482                 self._oleobj_.Invoke(*(args + (value,) + defArgs))
    483         def _get_good_single_object_(self, obj, obUserName=None, resultCLSID=None):
    484                 return _get_good_single_object_(obj, obUserName, resultCLSID)

com_error: (-2147352567, 'Exception occurred.', (0, 'Microsoft PowerPoint', 'Slide.Name : Invalid request.  Another slide already has this name.', '', 0, -2147188160), None)

Now I wonder if I follow the directions in OP to set the slide names manually, in PowerPoint, what will happen?

Thinking about it now, the answer makes sense. Since when you interact with the PowerPoint GUI, you are making winCOM calls.

image

So, given the way the winCOM bindings behave, maybe a PR could be made to allow direct indexing by name, and throw errors when trying to set two slides to the same name?