jdum / odfdo

python library for OpenDocument format (ODF)
Apache License 2.0
48 stars 11 forks source link

Accessing List within Table #5

Closed s3nn closed 5 years ago

s3nn commented 5 years ago

Hey,

First of all great project!

I just have a question regarding accessing lists (bullet points) within tables. Basically, I have many documents that are comprised of tables that have identical structure (rows / columns etc) and I'm trying to extract all data from all tables from multiple documents in a meaningful way. The problem is some cells contain bullet points (lists). Is there any way to get all text in a table cell with one method?

From some testing, it appears if I use get_value / get_values it doesn't return the text that is part of the List. However, I can use get_cells --> get_lists to extract the text, but I would need to check for the presence of any lists for each cell. Lastly, I could also use get_styled_elements for each cell, but this might get tricky.

What would you recommend? Thank you in advance and keep up the excellent work.

jdum commented 5 years ago

Hi,

Le mar. 26 févr. 2019 à 02:31, s3nn notifications@github.com a écrit :

Hey,

First of all great project!

I just have a question regarding accessing lists (bullet points) within tables. Basically, I have many documents that are comprised of tables that have identical structure (rows / columns etc) and I'm trying to extract all data from all tables from multiple documents in a meaningful way. The problem is some cells contain bullet points (lists). Is there any way to get all text in a table cell with one method?

From some testing, it appears if I use get_value / get_values it doesn't return the text that is part of the List.

yes, .value will try to cast to a basic type python type (typically a sting or a number)

However, I can use get_cells --> get_lists to extract the text, but I would need to check for the presence of

here it depends of the nature of your documents:

  • apparently you have some nested ODF elements into the cell, so the right methods is to analyse it step by step. So making lines of code, using get_cells --> get_lists and such. Note that get_lists should send back None if no list.

any lists for each cell. Lastly, I could also use get_styled_elements for

each cell, but this might get tricky.

What would you recommend? Thank you in advance and keep up the excellent work.

regards, jd

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jdum/odfdo/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/ACEFV8Xgc8dP3G9G7sN3-Rdc0pBPcX9tks5vRI5xgaJpZM4bRJo2 .

-- Jérôme Dumonteil

s3nn commented 5 years ago

Hey jd,

Thanks for your recommendation, it looks like cell.text_content is what I'm looking for!

Sincerely, s3nn