thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.81k stars 352 forks source link

Help request for Amazon price tracker with urlwatch #688

Closed Jorman closed 2 years ago

Jorman commented 2 years ago

Hi, I'm trying to build a simple price tracker for Amazon products, for example this one: image Note that as the language changes, so does Amazon's HTML, but I don't think the principle changes.

My current configuration is this:

name: "Test product"
url: "https://www.amazon.it/dp/B007427SAC/"
max_tries: 5
filter:
  - css:
      selector: '#corePrice_feature_div > div > span'
      maxitems: 1
  - beautify
  - html2text
  - grep: '[0-9]+[.,][0-9]{0,2}'
diff_filter:
  - grep: '^[@+]'

But I get strange response, sometimes the price is the same, but I get notification too, sometimes the price is not even present, I think is some kind of Amazon strange response, I usually check the page every 30 min. Here are some logs, maybe someone with better knowledge can tell me what I do wrong.

Log without changes but notified sent: with.log

Log with changes and notified: without.log

Why if the price is always the same do I get a notification?

J

kongomongo commented 2 years ago

maybe some non-printable chars are detected as diff?

Jorman commented 2 years ago

Hi, actually the code is this:

name: "Test product"
url: "https://www.amazon.it/dp/B007427SAC/"
max_tries: 5
filter:
  - css:
      selector: '#corePrice_feature_div > div > span > span.a-offscreen'
      maxitems: 1
  - beautify
  - grep: '([0-9]+[.,][0-9]{0,2}€)'
diff_filter:
  - grep: '^[@+]'

I changed it a little bit, but sometimes I get notification with empty changes

kongomongo commented 2 years ago

just to be sure: since you are omitting "deleted" lines in diff_filter you WILL be notified of an empty diff if there are ONLY deleted lines.

This is because after the diff has been calculated, the diff_filter only cosmetically changes the output to the reporter. It CAN NOT prevent a diff to be reported.

Jorman commented 2 years ago

Ok, make sense. I don't know well how to set it up correctly, the idea is simple keep track of the price and notify when it changes I also tried with shellpipe, but I don't know how to "pass" the results.

Do you've a better idea?

JsBergbau commented 2 years ago

I'm using a similar solution for Amazon for quite some time.

name: "UGreen Adapter"
url: https://www.amazon.de/UGREEN-Konverter-Festplatten-Dockingstation-Netzadapter/dp/B016UBXH3O
compared_versions: 2
headers:
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0
    Cookie: <Insert very long Amazoncookie here>
filter:
    - xpath: //*[@id="priceblock_ourprice"]
    - html2text: re

I explicitly use User-Agent and Cookie copied from my browser without beeing logged into Amazon. However Amazon detects this after some time and if you then test the filter without xpath expression you will get a response that says something like don't do this, Amazon has an API for getting the price.

Sometimes (without changing the IP adress or anything else in the configuration) price is present again. Thats why I used compared_versions: 2

So it comares empty price with "real" price. However you don't know when Amazon delivers the website with the price. Thats why I'm using Keepa now.

Jorman commented 2 years ago

Thank you, @JsBergbau, for all information, I think Keepa is a good alternative for Amazon. I'll use it too.