thp / urlwatch

Watch (parts of) webpages and get notified when something changes via e-mail, on your phone or via other means. Highly configurable.
https://thp.io/2008/urlwatch/
Other
2.85k stars 350 forks source link

question/support on xpath syntax #561

Open ghost opened 4 years ago

ghost commented 4 years ago

I am trying to learn how to get specific instances of <div> when they are named the same.

In this path '//div[contains(@class,"callouts-container")]' how would I specify the first instance of "callouts-container"? Maybe using [1] but where is this placed?

CDC COVID-19 website

I just want total cases, new cases, total deaths, new deaths, and not cases among HCP and Deaths among HCP which I am getting with my job:

# CDC COVID-19
name: (33)CDC COIVD-19 cases
url: "https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F2019-ncov%2Fcases-updates%2Fsummary.html"
filter:
  - xpath: 
     path: '//div[contains(@class,"callouts-container")]'
  - re.sub: '(?m)^[ \t]*' # removes all leading spaces
  - html2text: re
---

result:

iMac191:~ john$ uwtf 33
Total Cases
5,682,491
38,679 New Cases*
Total Deaths
176,223
572 New Deaths*
Cases among HCP
142,935
Deaths among HCP
660

I see there are some examples on usage on w3Schools but I can't always get that syntax to work in urlwatch or I am not understanding correctly.

I would rather not create "issues" for support so I created a urlwatch subreddit today. Hopefully these sorts of questions will move there and be answered there,,, a better forum.

ghost commented 4 years ago

Seems like this works. Is there another way?

# CDC COVID-19
name: (33)CDC COIVD-19 cases
url: "https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fcoronavirus%2F>
filter:
  - xpath: 
     path: '(//div[@class="cases-callouts"]/div[position()<4])'
  - re.sub: '(?m)^[ \t]*' # removes all -leading spaces
  - html2text: re
---
Mon Aug 24 06:48:04
iMac191:~ john$ uwtf 33
Total Cases
5,682,491
38,679 New Cases*
Total Deaths
176,223
572 New Deaths*

If you have multiple <div> that contain the same value that will preclude using this syntax //div[contains(@class,"cases-callouts")]' You can use: //div[@class='cases-callouts'] to find an exact match for the class value.