A majority of spiders in this project are attempting to parse HTML strings and
convert them into something more human-readable. The conversion step has been a
persistent pain point, it's time consuming, we're consistently getting this
wrong, and there exists many implementations of the same thing throughout our
codebase 😖 .
This new common code should allow us to refactor. Once a contributor drills down
to a single div, e.g. <p>January 21, 2021\xa0</p>, this cleaning function should
do the job 9 times out of 10.
Finally, we implement an example of the first common string cleaning function by
refactoring alle_improvements.
A majority of spiders in this project are attempting to parse HTML strings and convert them into something more human-readable. The conversion step has been a persistent pain point, it's time consuming, we're consistently getting this wrong, and there exists many implementations of the same thing throughout our codebase 😖 .
This new common code should allow us to refactor. Once a contributor drills down to a single div, e.g.
<p>January 21, 2021\xa0</p>
, this cleaning function should do the job 9 times out of 10.Finally, we implement an example of the first common string cleaning function by refactoring alle_improvements.