j0k3r / graby

Graby helps you extract article content from web pages
MIT License
365 stars 74 forks source link

Regex Strip #245

Open zyuhel opened 3 years ago

zyuhel commented 3 years ago

Hello everyone, Is it possible to write config to strip or replace some values based on regex not on xpath?

j0k3r commented 3 years ago

Not possible yet and quite complex because executing regex on html is hard.

zyuhel commented 3 years ago

Regex imo could be executed on two steps, first one on full html, it is not the best way to do it, but it is not pretty bad, if user write bad config, it is always problem of the user. And the second one, on the extracted config. Something like

regex_strip_before: regex_strip_after:

I could write it and make pr, but i am not sure, will it ever be included. Or should i make it around graby not inside.