Closed pragatibaheti closed 2 years ago
One second sounds like there is an issue somewhere. The unit tests have a test case that runs all the other tests (177 currently) in parallel on 16 threads and repeats this 1000 times (a brute force approach to discover multithreading issues). This test takes a little more than 2s on my desktop machine.
Can you attach a sample for each of the cases you have mentioned (document vs. fragment)?
Sure I am attaching a code snippet having a fragment. This takes close to 0.8sec. samplehtml.txt
Also, sanitizing a small fragment <div onclick="alert('xss')">hello</div><img src="javascript:void(0)"/>
takes 0.5 sec. Here is another sample https://dotnetfiddle.net/EVBg58
Can you please confirm.
These are mainly startup costs. Here's another fiddle demonstrating this (approx. 0.1ms per run): https://dotnetfiddle.net/yIut58
Try creating the HtmlSanitizer
object as a singleton within your application, then call Sanitize()
multiple times on this object, possibly from multiple threads as well.
Ok got it. But if my use-case demands calling sanitize once, then this time can be expected right? Also does this start time gets significantly change on the basis of configs we pass?
Is there any difference in time we can expect if we call sanitize/sanitizedom/sanitizedocument?
Ok got it. But if my use-case demands calling sanitize once, then this time can be expected right?
Yes. I believe the cost is mostly .NET infrastructure cost (JIT, assembly loading etc). For example, running the following just once takes 0.2s on my machine, too:
var json = @"{""key"": ""val""}";
var p = JsonConvert.DeserializeObject<Product>(json);
The average time when running this 10k times in a loop is just 0.05ms, though.
Also does this start time gets significantly change on the basis of configs we pass?
I don't think so.
Is there any difference in time we can expect if we call sanitize/sanitizedom/sanitizedocument?
I don't think so, either.
Thanks for the clarification
I am looking into the details of the library for sanitization and worked on sample inputs. I profiled some time summary that if we pass a complete HTML document close to 4000bytes it takes an avg of 1sec to sanitize (This has some CSS and HTML code containing div tags).
Using the same instance if we sanitize a HTML string (not complete HTML document) it takes 1sec for lesser size (~1000bytes also containing simple tags like a,u,b,i). Any reason for this behavior or what can lead to faster results as time plays a critical role in my use-case?
Would highly appreciate some suggestions.