Closed bjornri closed 1 year ago
Base: 94.43% // Head: 94.64% // Increases project coverage by +0.20%
:tada:
Coverage data is based on head (
a0da33f
) compared to base (008cff7
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
It looks like if we are to reach the codecov/patch target, the parameter on GetAllNodes() must be INode (not nullable) and callers have to check for null before calling it. I added a commit with this change.
Thanks!
HtmlSanitizer.GetAllNodes()
uses the program stack to iterate on elements in the DOM, and when we try to parse a deeply nested email (a few thousand elements deep) we get a stack overflow exception. Because our app runs in IIS it has a very small stack size. However, even if we increase our stack size, we will still crash - we just need a bigger HTML document. And when we crash with stack overflow, the entire app goes down.See https://github.com/mganss/HtmlSanitizer/issues/417 as this is the same issue as we are having.
Even though HtmlSanitizer throws StackOverflowException, AngleSharp is still able to parse the HTML.
I have rewritten
GetAllNodes()
to use a stack on the heap to avoid stack overflows, so that if AngleSharp can parse then HtmlSanitizer can sanitize it. I don't think HtmlSanitizer rely on the ordering of the returned enumeration, but I have kept the depth-first ordering we got when using the recursive approach.Note that there is the same issue with
StyleExtensions.GetStylesheets()
in AngleSharp that still cause HtmlSanitizer to crash when enumerating style sheets. I plan to send a PR to that repository as well.I think existing unit tests has enough coverage for this method so I did not create any additional tests.