thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.81k stars 352 forks source link

Usage of diff_filter #764

Open f0sh opened 1 year ago

f0sh commented 1 year ago

I try to setup an urlwatch configuration to report changes to a local command with diff_filter. After reading the documentation and the issues here in the repository I came along, that diff_filter was the appropriate way to do it.

However I cannot figure out how this is working, because the documentation for diff_filter is quite brief.

So far I figured out, that the diff output is past via a normal pipe. Because originally it was in a different issue mentioned, that the changes are given two filenames as arguments to the script. This seems obviously not true, because there are no given arguments.

However if I have a website with a long history, I get several outputs like this:

.
.
=== Filtered diff between state 7 and state 8 ===
<diff-content>
=== Filtered diff between state 8 and state 9 ===
<diff-content>
.
.

which causes, that my script runs several times instead of only once for notification. However I did not figure out, how to get my script recognised, that it is called a second time on the same url without implementing complex state.

I only found and use, that the following env variables are given to the script:

$URLWATCH_JOB_NAME
$URLWATCH_JOB_LOCATION

could it be possible to add another envvar like $URLWATCH_JOB_DIFF which contains with the diff currently compared. Then I had at least a chance, to run the script's command only on a certain state. However this requires, that the states are sorted DESC (because the newest state is always the highest number, and again the script does not know, what is the highest state number) or there should be another variable $URLWATCH_JOB_DIFF_LATEST so i can filter like

if $URLWATCH_JOB_DIFF == $URLWATCH_JOB_DIFF_LATEST

Maybe different ideas? Or maybe I even missed a point.

What I need

Whenever there is a new version, open a ticket with the title "New APP_NAME Version: APP_VERSION" from the command line.

My current configuration:

urlwatch-urls.yaml

name: Seafile Server
url: https://www.seafile.com/en/download/
filter:
  - css: '#server+div.row a'
  - html2text
  - grep: '64bit'
diff_filter:
  - grep: '^[+][^+]'
  - shellpipe: releases.sh

releases.sh

#!/bin/bash
read ver
echo "New $URLWATCH_JOB_NAME Version: $ver"

output

=== Filtered diff between state 0 and state 1 ===
New Seafile Server Version: +9.0.4 64bit
=== Filtered diff between state 1 and state 2 ===
New Seafile Server Version: +9.0.5 64bit
=== Filtered diff between state 2 and state 3 ===
New Seafile Server Version: +9.0.6 64bit
=== Filtered diff between state 3 and state 4 ===
New Seafile Server Version: +9.0.7 64bit
=== Filtered diff between state 4 and state 5 ===
New Seafile Server Version: +9.0.8 64bit
=== Filtered diff between state 5 and state 6 ===
New Seafile Server Version: +9.0.9 64bit
=== Filtered diff between state 6 and state 7 ===
New Seafile Server Version: +9.0.10 64bit
=== Filtered diff between state 7 and state 8 ===
New Seafile Server Version: +10.0.0 64bit (beta)
=== Filtered diff between state 8 and state 9 ===
New Seafile Server Version: +10.0.1 64bit

version

$ urlwatch --version
urlwatch 2.28

$ python --version
Python 3.9.13
thp commented 8 months ago

Note that the --test-diff-filter option is the only one that calls your script multiple times. At normal runs, the script is only called once when a diff occurs. Do you mean that the script should detect whether it's running in test mode or "production" mode?