Open Cerebex opened 2 months ago
Hi @Cerebex, I had a look deeper into this issue, and it seems like VettaFi are now using JavaScript-based checks to verify that requests are not coming from bots. This can probably be solved by using Selenium to retrieve the page source, but I am quite busy these few days, so it will take me a while to get to.
Will keep posted when a fix is pushed.
Really appreciate it. I found I could get it to work with selenium, as you stated, but only when I physically opened the browser which is not ideal.
@Cerebex If I am not wrong, you can run Selenium in headless mode. Are you saying it doesn't work when you do that? Either way, it will be great if you can share that code to help fix this issue. It will be great help to get a load off my back! :)
Update: I will get this solved sometime in November/December. Apologies to whoever is using this package, but I do not have the time now to work on this.
Using this as a guide: https://www.zenrows.com/blog/403-web-scraping#set-fake-user-agent This is not my area of expertise, so I'm not sure if it's a permanent fix. Seems to work for me thus far. `# etf_scraper.py class ETFScraper(object):
def __init__(
self,
ticker: str,
user_agent: str = None,
):
self.ticker = ticker
self.base_url: str = "https://etfdb.com/etf"
self.user_agents = load_user_agents()
self.request_headers: dict = {
######
# "User-Agent": user_agent if not user_agent else random.choice(self.user_agents), <------- replace this line with the line below
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",,
"Referer": "https://etfdb.com/etfs/QQQ"
}
self.scrape_url: str = f"{self.base_url}/{ticker}"
soup = self.__request_ticker()
self.etf_ticker_body_soup = soup.find("div", {"id": "etf-ticker-body"})`
Describe the bug 403 Error when attempting to use spy = ETF('SPY')
To Reproduce Steps to reproduce the behavior:
Expected behavior Pull information properly
Additional context It times out and waits 15 minutes but does not fix the issue.