j0k3r / graby

Graby helps you extract article content from web pages
MIT License
363 stars 73 forks source link

Move rewrite_url from HttpClient to site config #200

Open j0k3r opened 5 years ago

j0k3r commented 5 years ago

Regarding https://github.com/j0k3r/graby/pull/199 & https://github.com/wallabag/wallabag/issues/3767 we need to find a better to add custom rewrite urls within site config.

Mostly to avoid final user of graby to re-define all rewrite_urls rules.

techexo commented 5 years ago

For memory, https://github.com/wallabag/wallabag/issues/3697 is also relevant, for bloomberg.com. Right now, a workaround is using in HttpClient: 'www.bloomberg.com' => ['www.bloomberg.com' => 'www.bloombergquint.com'],

Edit: for wallabag/wallabag#3767: 'www.lesswrong.com' => ['www.lesswrong.com' => 'www.greaterwrong.com'],