scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.39k stars 518 forks source link

How to extract CategoryAxis class names from a chart in a PowerPoint? #948

Closed Number18-tong closed 7 months ago

Number18-tong commented 7 months ago

Thanks for your remarkable work! I try to get CategoryAxis class names and series names and values by python-pptx0.6.22, but I can only get the text of series names and values, can not find a way to extract CategoryAxis class names. How to get CategoryAxis class names from a chart in a PowerPoint?

scanny commented 7 months ago

I'm not familiar with the term "CategoryAxis class name". Can you say more?

Number18-tong commented 7 months ago

企业微信截图_17096040376036 Sorry, I may not be able to express myself clearly. The red box shown in the above picture is the "CategoryAxis class name" which I cannot extract from a chart.

scanny commented 7 months ago

I think you're looking for Axis.tick_labels: https://python-pptx.readthedocs.io/en/latest/api/chart.html#pptx.chart.axis._BaseAxis.tick_labels

There is some more in the documentation here as well: https://python-pptx.readthedocs.io/en/latest/user/charts.html#axes

Number18-tong commented 7 months ago

Thanks for your reply. I read the documentation and try to find a way to extract the text of tick_labels of a chart, but still can not achieve. Is the text of tick_labels can only be set or be changed and not be able to read?

scanny commented 7 months ago

Post the code you tried and I'll take a look.

Number18-tong commented 7 months ago

Post the code you tried and I'll take a look.

from pptx import Presentation
import json

def get_chart_data(chart):
    charttext = ""

    ## try to get the text of category_axis.tick_labels but can not achieve (x axis labels)
    # chartxaxislabels = ""
    # tick_labels = chart.category_axis.tick_labels
    # chartxaxis =

    for series in chart.series:
        charttext += "|\t" + series.name + "\t|"
        charttext += '\t|'.join([str(value) for value in series.values]) + "\t|\n"  #一列数据
    return {'chart title': chart.has_title and chart.chart_title.text_frame.text or '',
            'chart': charttext}

def extract_elements_info(ppt_file):
    presentation = Presentation(ppt_file)
    for i, slide in enumerate(presentation.slides):
        for baseshape in slide.shapes:
            chart = baseshape.chart
            text = get_chart_data(chart)
    return text

ppt_file_path = r"F:\temtestres\pdf\eval\ppttest/test.pptx"
res = extract_elements_info(ppt_file_path)
print(res)
with open(r'F:\temtestres\pdf\eval\ppttest/test.json', 'w', encoding="utf-8") as file:
    json.dump(res, file, indent=2, ensure_ascii=False)
    # json.dump(res, file, indent=2)

print("Done")
Number18-tong commented 7 months ago

The goal of the above code is to input a PowerPoint with a chart and output the text Markdown of the table corresponding to the chart, but the code can only get the red box part in the following image. I try to get the x axis labels by the chart.category_axis.tick_labels, but there is no text information. Is there a way to get the the text of x axis labels? 企业微信截图_17097082487288

scanny commented 7 months ago

Ok, you're looking for Plot.categories. There can be more than one plot in a single chart, like a line chart overlaid on a bar chart and each can have different categories.

Try chart.plots[0].categories.

Number18-tong commented 6 months ago

It does work now, Thanks very much!!!!!