Closed pawelpbm closed 10 months ago
So far I'm using the following script as filter:
#!/usr/local/bin/python3
import sys
import os
from urllib.parse import urljoin
from bs4 import BeautifulSoup
content = sys.stdin.read()
soup = BeautifulSoup(content)
job_location = os.getenv('WEBCHANGES_JOB_LOCATION')
for anchor in soup.findAll('a', href=True):
anchor['href'] = urljoin(job_location, anchor.get('href'))
print(str(soup))
I think it would be useful if webchanges could do it natively.
Great idea, thanks!
Let me give it some thought; it's probably more lightweight to do it using urllib.parse.urljoin using by lxml to extract the tags, and it would not require additional dependencies (i.e. BeautifulSoup).
It just dawned on me that relative links should be automatically rendered as absolute links in the html
report.
Can you please confirm whether you have a use case with text
or markup
report types or it's a problem with the html
report?
I'm using report type email
with html: true
, but TBH I'm not sure if it's actually based on html
report or on text
. I think when I wanted to have separate emails per url I had to change separate: true
in text
section.
How can I actually confirm that?
I do have coloring in the emails, as on the screenshot.
OK, I looked at the code and it's html2text
that modifies the relative links to make them absolute. Does using that filter work in your user case?
While html2text indeed converts the links to absolute it also "renders" the HTML. This means that instead of simple and very readabe diff I'm getting a lot of mess...
While html2text indeed converts the links to absolute it also "renders" the HTML. This means that instead of simple and very readabe diff I'm getting a lot of mess...
I don't quite understand the setup or data that you have since that setup (html2text plus an html reporter) is typically the most readable one to track HTML sources, and one that I use all the time, and the diffs are very readable and clockable.
I will add the recommended filter at the next release for your use.
Thanks for the contribution, very much appreciated!
P.S. if you have any suggested names for the filter, I am all ears!
Maybe I'm doing something wrong, I'm pretty new to using webchanges.
This what I'm getting without the html2text
and I actually quite like that format: https://drive.google.com/file/d/1RYS-7mmZDBVDoczVdalYCAF4-EH9g3SR/view?usp=sharing
That's what I'm getting with html2text
: https://drive.google.com/file/d/1Xj5733ffmEzFDM-zv86GNJ_B7YkBGeWL/view?usp=sharing
Both screenshots from webinterface in Gmail.
Implemented in v3.16: https://webchanges.readthedocs.io/en/stable/filters.html#absolute-links
Please let me know if there are problems with it (and I just noticed the error in the documentation, which incorrectly shows it new in version 3.17)
My filter is monitoring when new link is added on the website and then returns content of the
<a>
tag. It's fine if the page contains absoulte links, but many page contsins only relative links.It would be cool if webchanges could replace relative links with abosulte links. I tried to find some nice command line tool that would do that, but there are only some hacky ways listed here.
Also as noted in the above link it should be easy to do that with BeatifulSoup, but I would probably see it as separate filter than
beautify
.