ruby-docx / docx

a ruby library/gem for interacting with .docx files
MIT License
431 stars 170 forks source link

Order of paragraphs and tables #118

Open Prigin opened 2 years ago

Prigin commented 2 years ago

Problem

I need to get all paragraphs and tables in order they have in docx file. Is there any way I can do this?

Solution

May be just one index for paragraph objects and table objects will be enough.

satoryu commented 2 years ago

You mean that at the latest version of this gem Document#paragraphs returns paragraphs in wrong order, right? Could you give us a docx file to reproduce this behavior if you have? The file would help us to investigate what happens.

Thanks

Prigin commented 2 years ago

Not exactly. :) Sorry for not being transparent. Lets say I have a docx that I want to convert to txt:

image

I need to know place of each element(paragraphs and tables). How to get the same order of elements they have in DOCX? Or maybe they already have that method(which returns order number from doc). I cant actually find it :(

aunghtain commented 2 years ago

I was able to do this as followed. I'm using private vars/methods, but if they open up more APIs in the future, we won't have to.

    doc = Docx::Document.open(file)
    doc.instance_variable_get("@doc").xpath('//w:document//w:body').children.each do |c|
      if c.name == 'p' # paragraph
        p = doc.send(:parse_paragraph_from, c)      
      elsif c.name = 'tbl' # table
        t = doc.send(:parse_table_from, c)  
      else # other types?
      end
    end
aunghtain commented 2 years ago

if u just want text, u don't need to parse them as paragraph/table. u can just get as "c.content"