scrapy / parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
BSD 3-Clause "New" or "Revised" License
1.15k stars 146 forks source link

Keep element's tailing text after removing it #207

Closed Scarfmonster closed 2 years ago

Scarfmonster commented 3 years ago

Lxml by design removes the text after removed element. This change removes the element and keeps the trailing text by appending it to the previous element or to the parent.

Fixes #206

Gallaecio commented 3 years ago

Could you add tests to cover the change?

Gallaecio commented 3 years ago

https://github.com/scrapy/parsel/issues/215 shows a cleaner way to handle this.