openresty / replace-filter-nginx-module

Streaming regular expression replacement in response bodies
260 stars 68 forks source link

replace fully rendered dom instead of from html source? #20

Closed hanselke closed 7 years ago

hanselke commented 7 years ago

Hi,

not sure if this can be done. My goal is to do a replace_filter on the fully rendered DOM instead of the .html response.

Have control over the application server and the point is to inject features into the application without having to fully understand how their server side+client side js work together to allow the end user to see the DOM.

Is this something that can be done by somehow having the nginx server process the html fully first?

agentzh commented 7 years ago

@hanselke That would be very expensive (both in terms of memory usage and CPU usage), especially for large HTML responses. This module can work on HUGE response body streams with constant memory usage and O(n) time complexity. I don't think it is possible at all with your full DOM rendering approach. Thus your approach is not suitable for online processing in the first place. It can easily be on the level of hundreds of milliseconds of CPU time for a typical HTML document (not even a big one!). And it would be even slower when you decide to run JavaScript in the page (since you'll also need to resolve external .js resources by initiating more requests).

agentzh commented 7 years ago

@hanselke Instead of constructing a full DOM upfront, one can actually build a very efficient state machine atop an efficient HTML tokenizer based on this module. This way, one can do SAX-style parsing without losing much efficiency (both for CPU and memory). Still, running JS inside the document is not suitable for this thing at this phase.

hanselke commented 7 years ago

tried looking for html tokenizers on nginx but dont seem to find any. From what i understand, it breaks down html into tokens... but how would that help with getting the JS injected html code?

thanks for the quick reply!