Closed benoit74 closed 2 months ago
@rgaudin this is far from being done, but I would like to have a first feedback on this approach
For now only the simple "drop attribute" case is implemented, but it gives a rough idea of what it will look like if I continue on this approach.
I'm a bit surprise I had to implement the call_func
helper function myself, I'm pretty sure it is possible without this custom code and Python stdlib code, but I failed to find how to write it properly. Idea is that I want to pass whatever argument are defined in the decorated function but not other ones.
Refactoring work is now completed, all HTML rewrite operations are isolated in standalone methods.
Fix #305
Idea: we should be able to define individual rewriting rules with functions and decorators, just like we do when defining endpoints in FastAPI. We have one decorator per kind of modification: drop attribute, rewrite attribute, rewrite data (and maybe others) and we implement one decorated function per kind of modifications expected.
Changes:
drop_attribute
(to decide to drop an HTML tag attribute),rewrite_attribute
(to rewrite a given HTML tag attribute - name + value),rewrite_tag
(to rewrite the whole HTML tag, potentially dropping it completely or changing tag name),rewrite_data
(to rewrite HTML tag data)See https://github.com/openzim/warc2zim/pull/353 for a show case of how this new code structure make it easier to: