hrbrmstr / docxtractr

:scissors: Extract Tables from Microsoft Word Documents with R
Other
174 stars 29 forks source link

Numbers are lost when reading cells with numbered lists #33

Open gorkang opened 2 years ago

gorkang commented 2 years ago

First of all, thanks for the amazing package!

I am trying to read the contents of a docx table and having issues with numbered lists. If I enter the numbers by hand, all is well, but if I use a numbered list, the numbers are lost when extracting the table.

As you can see in the reproducible example below, row 2 (Items) is extracted fine ("1. First item\n2. Second item"), but in row 3 (Items2), the numbers are lost ("First item\nSecond item").

Screenshot from 2022-08-03 13-11-21

DOC = docxtractr::read_docx("https://github.com/gorkang/BUG_docxtractr/blob/master/test.docx?raw=true")
TABLE = docxtractr::docx_extract_tbl(DOC, preserve = TRUE, header = FALSE)

# Not using a numbered list. All is fine
TABLE$V2[2]
#> [1] "1. First item\n2. Second item"

# In row 3 we use a numbered list. Numbers are lost
TABLE$V2[3]
#> [1] "First item\nSecond item"

Thanks!

Created on 2022-08-03 by the reprex package (v2.0.1)