python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.39k stars 1.09k forks source link

how to create nested list? #122

Open SaulHormazabal opened 9 years ago

SaulHormazabal commented 9 years ago

exemple:

link: ooxml numbering

scanny commented 9 years ago

You can achieve this using styles such as List Number, List Number 2, etc. If the built-in ones don't suit you can customize your own.

The problem arises when you want to reset numbering, which is not yet supported in python-docx. Some folks have done it, but it requires a fair amount of manual work directly on the XML.

virajkanwade commented 9 years ago

See if this helps: https://github.com/virajkanwade/python-docx/commit/7cb692a14b8010a04bc0945fc6b4a149a045f776

numId till 10 is for predefined numbering formats.

so ilvl=0, numId=10 could be your first level and first numbering series. For nested increment ilvl. For new numbering sequence, increment the numId.

Substr commented 8 years ago

I notice this issue is still open so I was wondering if someone could clarify how this should be done? @scanny you mentioned this can be done with the built-in styles, but I'm not seeing how. There's nothing in the documentation which describes how to create a nested numbered list either.

If this feature relies on the change that @virajkanwade made above, can that be merged in?

scanny commented 8 years ago

Try doing it by hand first, using Word.

If the document looks like the one you want when you're done, just apply these styles to the list paragraphs in your document according to your needs.

virajkanwade commented 8 years ago

Heres a snippet from my html to docx convertor. HTH.

self.numId = 10  # Always start with 10
p = self.document.add_paragraph()
              def _processContent(p, node, font={}, numbering={'ilvl': -1}):
                    .
                    .
                    .
                    elif content.name in ('ul', 'ol'):
                        new_numbering = numbering.copy()
                        new_numbering['style'] = 'ListBullet'
                        if content.name == 'ol':
                            new_numbering['style'] = 'ListNumber'
                        new_numbering['ilvl'] = numbering['ilvl'] + 1
                        r = _processContent(p, content, font, new_numbering)
                        self.numId += 1
                    elif content.name == 'li':
                        p1 = self.document.add_paragraph(style=numbering['style'])
                        if numbering['style'] == 'ListNumber':
                            p1.set_numbering(numbering['ilvl'], self.numId)
                        r = _processContent(p1, content, font, numbering)
Substr commented 8 years ago

@scanny I'm not sure what list menu items inside Word map to those "List Number" styles. At least the version of Word I have has a separate selection for "multi level lists" which allow me to just select all 5 of the paragraphs, apply that style, and then indent as needed and the numbering follows.

I've tried many combinations with python-docx to build a doc using ListNumber, ListNumber2 etc. styles and while the indenting is correct, the numbering restarts rather than using 1.1, 1.2 etc.

Substr commented 8 years ago

@virajkanwade Thanks for the snippet, but am I correct in saying the patch you linked earlier isn't in the base version so I'd have to manually modify it to get access to the numbering editing ability?

virajkanwade commented 8 years ago

@Courthold That @scanny will have to confirm. Haven't had much time to work or even look at python-docx. That reminds me, I need to check if I am using my fork or the original one in my code. Thanks :)

Substr commented 8 years ago

@virajkanwade I gave your branch a go with set_numbering() but unfortunately still only able to get an output like this:

screen shot 2016-01-18 at 15 06 24

This is the same output I see using the suggestion from @scanny to use ListNumber, ListNumber2 etc.

If I then highlight that list and choose this highlighted item from the list menu in word:

screen shot 2016-01-18 at 15 07 37

.. then the list appears with correct numbering, as below:

screen shot 2016-01-18 at 15 09 15

If I compare the before and after document.xml then they are the same in regards to the ilvl and numId attributes, however the numbering.xml of the manually-edited working version contains a whole extra block which starts like this:

<w:abstractNum w:abstractNumId="9">
    <w:nsid w:val="08240F99"/>
    <w:multiLevelType w:val="multilevel"/>
    <w:tmpl w:val="0409001F"/>
    <w:lvl w:ilvl="0">
      <w:start w:val="1"/>
      <w:numFmt w:val="decimal"/>
      <w:lvlText w:val="%1."/>
      <w:lvlJc w:val="left"/>
      <w:pPr>
        <w:ind w:hanging="360" w:left="360"/>
      </w:pPr>
    </w:lvl>
   ...

This doesn't seem to be added by python-docx when defining the lists, so it seems like this kind of multilevel list that I'm after isn't supported at the moment, unless I'm doing something quite wrong?

Thanks

scanny commented 8 years ago

Yes, now you've jogged my memory. Doing these is a real pain in Word. Basically you have to define a new numbering definition or something, I don't remember all the details.

This SO answer has some of the details from when it was fresher in my mind. http://stackoverflow.com/questions/23446268/python-docx-how-to-restart-list-lettering

But the basic gist is you need to add a new list "instance" that refers to a list definition that specifies the layout details. Crazy, I know :)

You definitely want to stick to using styles for these rather than the way it's done using the toolbar icons. I'd see if you can make it work that way using the Word UI, like use list number 2 style and restart numbering at the right spots, then observe the differences in the XML.

Substr commented 8 years ago

@scanny Thanks for following up. Just to clarify, my problem is that it is restarting the numbering in the lower level lists, I don't want it to :) If I use ListNumber, ListNumber2 etc. then each list item just restarts the count at 1 (first block below). If I keep the same formatting and indent myself, then it keeps the top level numbering order, but never uses decimals (i.e. I would like that 2nd list to go 2, 2.1, 2.1.1 ). This doesn't seem possible just with the styles that I can see.

screen shot 2016-01-22 at 10 52 35
junctionapps commented 8 years ago

@Substr Did you come up with a way to handle the multi-outline list in the end? I have similar issue, not getting the 1.1, 1.2, 1.3 etc to adhere to the multi-level list definition. However, if in the generated document I change the style from List Number 2 to Normal (or anything else) and back to List Number 2 it formats as defined in the multilevel list (as desired). I'm digging in to it now, but thought I'd ask since your request is not too old.

Substr commented 8 years ago

@junctionapps Unfortunately not. In the end we just had to ditch using correct list formatting entirely, and just manually keep track of the doc structure, left indenting paragraphs based on their 'level' and add the number to the heading of each section.

junctionapps commented 8 years ago

This appears to be quite possible if if the multi-list definition doesn't use things that are already styled as a list. I defined a new multi-outline list using Word after creating a couple of styles based on "Normal". I called them NormalListParagraph1, NormalListParagraph2 ... Then in the Multi-outline list I kept ListNumber as the first or highest element, then used NormalListParagraph1 and 2 as the others, adjusting the formatting as desired.

I tried this as when running some diffs on the document.xml files I noticed the

<w:pPr>
    <w:pStyle w:val="ListNumber2"/>
    <w:numPr>
        <w:ilvl w:val="0"/>
        <w:numId w:val="44"/>
    </w:numPr>
</w:pPr>

could be amended to not include the w:numPr node and things worked as desired. So figuring the ListNumber2 styles came along with those nodes, a separate style based on Normal might do the trick. It did. image

solarjoe commented 7 years ago

Is there a solution to this? I tried various things I found across the internet, but nothing worked.

#these functions seem to be no longer available
p.set_numbering(0, 10)

p.set_numId(10)
p.set_ilvl(0)

p.numId(10)
p._set_ilvl(0)
solarjoe commented 7 years ago

Found it. Here is an example:

p = document.add_paragraph('first item in unordered list', style='List Bullet')
p = document.add_paragraph('first item in unordered list', style='ListBullet')
p = document.add_paragraph('first item in unordered list', style='ListBullet2')
p = document.add_paragraph('first item in unordered list', style='ListBullet2')
p = document.add_paragraph('first item in unordered list', style='ListBullet3')
p = document.add_paragraph('first item in ordered list', style='ListNumber')
p = document.add_paragraph('first item in ordered list', style='ListNumber2')
p = document.add_paragraph('first item in ordered list', style='ListNumber3')
p = document.add_paragraph('first item in ordered list', style='ListNumber3')
p = document.add_paragraph('first item in ordered list', style='ListNumber')
p = document.add_paragraph('first item in ordered list', style='ListNumber')
mattychen commented 6 years ago

Thanks @solarjoe I found that very useful.

I created a template for every new document created to draw from and inherit a certain style.

solarjoe commented 6 years ago

For future reference some links to the doc and the default styles:

http://python-docx.readthedocs.io/en/latest/

http://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#built-in-styles

http://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#paragraph-styles-in-default-template

fengyuyan commented 6 years ago

Can anyone gives an sample code to achieve below:

  1. item1 1.1 item1.1
  2. item2 2.1 item2.1
croots commented 6 years ago

This seems like a basic question, but I cant for the life of me figure out how to cleanly indent elements of a list. Using 'List Bullet' and 'List Bullet 2' as per @solarjoe's solution, I get something similar to the attached.It appears to add extraneous linebreaks, which makes things look very messy.

screen shot 2018-06-06 at 1 16 03 pm

solarjoe commented 6 years ago

@croots, the styles are defined by your Word template. They might look different on your computer than on others. Can you re-define them?

scanny commented 6 years ago

I'm inclined to think @solarjoe is quite right. Try adding paragraphs to the document and setting their style to List Bullet and List Bullet 2 and see what they look like. If they look the same as paragraphs added with those styles by python-docx, you should look to updating the style definitions in the template document as the root of the problem.

chaithanyaramkumar commented 6 years ago

how to read bullets or numbering in an existing document for example input is 1.apple 2.boy output is ['1.apple','2.boy'] like that please answer it

rdt0086 commented 6 years ago

I worked through the issue of getting a list created today. I was able to use the built in features (I.e. "List Bullet", "List Bullet 2", etc..). However, after hours of googling and attempting to edit the styles I was not able to get the spacing, indention, or the symbols I wanted to use as bullets. So I built a work around class. I am not a programmer, so this can definitely be improved and forgive me for any mistakes, but it seems to work.

import docx

class Paragraph_List(object):
    def __init__(self,*args): #args are: doc,item1,ordered,style,fmt (style and fmt optional)
        self.args = args
        self.doc =self.args[0]
        self.item1 = self.args[1]
        self.ordered = self.args[2]
        self.place = {}
        List_dict = {'Roman':[['I','II','III','IV','V','VI','VII','VIII','IIX','IX','X'],['A','B','C','D','E','F',
            'G','H','I','J','K','L'],['1','2','3','4','5','6','7','8','9','10'],['a','b','c','d','e','f','g','h',
            'i'],['i','ii','iii','iv','v','vi','vii','viii','iix','ix','x']],'ABC':[['A','B','C','D','E','F','G',
            'H','I','J','K','L'],['1','2','3','4','5','6','7','8','9','10'],['a','b','c','d','e','f','g','h','i'],
            ['i','ii','iii','iv','v','vi','vii','viii','iix','ix','x']],'123':[['1','2','3','4','5','6','7','8','9','10'
            ],['a','b','c','d','e','f','g','h','i'],['i','ii','iii','iv','v','vi','vii','viii','iix','ix','x']],'Bullet':[
            ['●','○','•','◦']]}
        if self.ordered == True:
            if len(self.args) < 4:
                self.fmt = List_dict['Roman']
            elif self.args[3] == 'Custom':
                self.fmt = self.args[4]
            else:
                self.fmt = List_dict[self.args[3]]
            self.p = self.doc.add_paragraph(self.fmt[0][0]+'. ' + self.item1 + '\n')
        else:
            self.fmt = List_dict['Bullet']
            self.p = self.doc.add_paragraph(self.fmt[0][0]+ ' ' + self.item1 + '\n')
        # self.doc.add_paragraph(item1, style=self.level)

    def add_item(self, item, level):
        self.level = level
        self.place[1] = 0
        if self.level ==1:
            sp = ""
        else:
            sp = "    "
            sp = sp *(self.level -1)
        if self.level in self.place:
            self.place[self.level] += 1
        else:
            self.place[self.level] = 0
        if self.ordered ==True:
            self.p.add_run(sp + self.fmt[self.level - 1][self.place[self.level]] + '. ' + item + '\n')
        else:
            self.p.add_run(sp + self.fmt[0][self.level - 1]+ ' ' + item + '\n')

document = docx.Document()
mylist = Paragraph_List(document, 'item 1', True)
mylist.add_item('Level 2 Item 1',2)
mylist.add_item('Level 2 Item 2',2)
mylist.add_item('Level 2 Item 3',2)
mylist.add_item('Level 3 Item 1',3)
mylist.add_item('Level 3 Item 2',3)
mylist.add_item('Level 3 Item 3',3)
mylist.add_item('Level 3 Item 4',3)
mylist.add_item('Level 3 Item 5',3)
mylist.add_item('Level 1 Item 2',1)

mylist2 = Paragraph_List(document, 'Bullet Level 1 Item 1', False)
mylist2.add_item('Level 1 Item2',1)
mylist2.add_item('Level 2 Item1',2)
mylist2.add_item('Level 1 Item3',1)
mylist2.add_item('Level 2 Item1',2)
mylist2.add_item('Level 2 Item2',2)
mylist2.add_item('Level 2 Item3',2)
mylist2.add_item('Level 3 Item1',3)
mylist2.add_item('Level 4 Item1',4)
mylist2.add_item('Level 4 Item2',4)
mylist2.add_item('Level 2 Item4',2)

document.save('new.docx') 
kf9031 commented 5 years ago

@rdt0086 Do you have an idea how can I insert this item list to a specific insertion point?

rdt0086 commented 5 years ago

@kf9031 I am not sure where you are starting from rather from data or from an existing document. From data I read the data with python and identify the insertion point(character location) by knowing what information it should follow and then insert using that knowledge. From a file, I would read the file as data and then do the same. You can always re-write the file with the new information/layout.

The below function is where I am getting data from a pdf document by knowing the Text (they are labels) before the information I want to extract. You can use the same method to insert. This is the best example I have with me now, but perhaps later if you are still having issues I can put a more appropriate example together.

def get_info_from_app(candidate, location):
    #setup page object and get text
    item_return = []
    pdf_object = open(location + '\\' + candidate + '\\App_'+ candidate +'.pdf','rb')
    pdfReader = PyPDF2.PdfFileReader(pdf_object)
    page_obj = pdfReader.getPage(0)
    phone_obj = page_obj.extractText()

    #set items: starting location text and ending location text per item
    items = [['Mobile Phone Number', 'Please send information'],['Primary Phone','Mobile Phone Number']]

    for item in items:
        start_location = phone_obj.find(item[0]) + len(item[0])
        end_location = phone_obj.find(item[1])
        info_item = phone_obj[start_location:end_location].strip('\n')
        item_return.append(info_item)
    return item_return
kf9031 commented 5 years ago

@rdt0086 Thanks for your quick reply. I want to insert the item list in a Word document (template) at an insertion point indicated by a label. I was hoping that there is a simple solution like this: https://stackoverflow.com/questions/24965042/python-docx-insertion-point/34814241#34814241