hrbrmstr / docxtractr

:scissors: Extract Tables from Microsoft Word Documents with R
Other
174 stars 29 forks source link

Allow for OOXML where commentStartRange and commentEndRange nodes are not siblings #30

Open WilDoane opened 3 years ago

WilDoane commented 3 years ago

I've run across a valid DOCX XML structure where the commentStartRange and commentEndRange nodes are not siblings:

trimmed-down

This is an attempt to allow for accurate comment and anchor text extraction from such documents.

All original tests pass. devtools::check(args = c('--as-cran'), build_args = c('--resave-data')) passes

I've added a few additional tests to verify accurate anchor text, author, and initials extraction as well as a new test document that has this alternative XML structure.

I did not update the NEWS or DESCRIPTION files, since I didn't know whether you already had other updates in process.