mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.52k stars 198 forks source link

SanitizeDocument not disposing of AngleSharp HtmlDocument leading to memory leak #396

Closed fiseekade closed 1 year ago

fiseekade commented 1 year ago

I see that after running SanitizeDocument(Stream html, string baseUrl = "", IMarkupFormatter? outputFormatter = null) over many files for several hours, that the memory usage of my application keeps increasing and eventually runs out of memory. I've tracked down the memory leak to the SanitizeDocument call which calls AngleSharp ParseDocument that returns a IHtmlDocument that is not getting cleaned up it seems.

To get around this memory leak issue, I had to switch to using the SanitizeDom function and then dispose of the IHtmlDocument in my code. Can you fix the SanitizeDocument function so that it disposes of the IHtmlDocument ? Thanks.

mganss commented 1 year ago

See #131.

In your analysis, can you dig down further and see which unmanaged resources are not released?

The managed IHtmlDocument object returned by ParseDocument is only held in a local variable inside the SanitizeDocument method and is eligible for garbage collection when the method returns. Neither IHtmlDocument nor the concrete implementation class HtmlDocument implement IDisposable so it shouldn't be necessary to dispose them.

fiseekade commented 1 year ago

Thanks for your response.

At this point, I'm not sure which are the unmanaged resources as this is buried in AngleSharp's library. I'd have to take a closer look to find out.

But in AngleSharp's library, HtmlDocument inherits from Document which has a Dispose method - here is the link - https://github.com/AngleSharp/AngleSharp/blob/devel/src/AngleSharp/Dom/Internal/Document.cs

mganss commented 1 year ago

You're right. IHtmlDocument does implement IDisposable through IDocument. I've added a using statement.

fiseekade commented 1 year ago

Thanks for the quick response and fix. When would a new release be available?

mganss commented 1 year ago

I've just released 8.0.601.