ioos / ioos_metrics

Working on creating metrics for the IOOS by the numbers
https://ioos.github.io/ioos_metrics/
MIT License
2 stars 4 forks source link

programmatically collect number of Deployments for ATN for IOOS By The Numbers #51

Closed MathewBiddle closed 5 months ago

MathewBiddle commented 5 months ago

I've tried some tricks with selenium to capture the number of Deployments from https://portal.atn.ioos.us/ but it's not functioning properly.

# stuff to get selenium working
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')

# onto the code
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

wd = webdriver.Chrome('chromedriver',options=chrome_options)

wd.get("https://portal.atn.ioos.us/#")

atn_deployments = wd.find_elements(By.XPATH, "//p[@class='val']")[-1].text

atn_deployments = 4639

print("ATN Deployments:",atn_deployments)

@ocefpaf do you know of any tricks to be able to automatically harvest the deployments number from https://portal.atn.ioos.us/ ?

ocefpaf commented 5 months ago

@ocefpaf do you know of any tricks to be able to automatically harvest the deployments number from https://portal.atn.ioos.us/ ?

I can give it a try. I'll keep you posted...

ocefpaf commented 5 months ago

@MathewBiddle I did not test your solution but you are probably hitting the same issue I had when cooking up this notebook:

https://gist.github.com/ocefpaf/58bf92aa105d514c5dae105bcef89c8e

I had to adjust the wait times to an absurdly high amount to get everything to load. I also was not able to make it wait for the expected element to load. However, we can hit the payload directly and get the data used to build the page with:

import requests

headers = {
    "Accept": "application/json"
}

raw_payload = requests.get("https://search.axds.co/v2/search?portalId=99", headers=headers)
json_payload = raw_payload.json()
for plt in json_payload["types"]:
    if plt["id"] == "platform2":
        print(plt["count"])
        break

While that is a bit awkward, it is much faster.

MathewBiddle commented 5 months ago

How did you find https://search.axds.co/v2/search?portalId=99

ocefpaf commented 5 months ago

How did you find https://search.axds.co/v2/search?portalId=99

You have to open the page in the debug mode and watch for what it is requesting. But I had help there from someone who knows way more than I do about web stuff. @marcelotrevisani is the one that suggested that route.

MathewBiddle commented 5 months ago

Thanks @ocefpaf and @marcelotrevisani!