mariostoev / finviz

Unofficial API for finviz.com
MIT License
1.06k stars 277 forks source link

Screener broken since 2 days ago #64

Closed somguyth closed 4 years ago

somguyth commented 4 years ago

Before, I could pull data for the entire universe of tickers (about 7500 tickers) using the following code:

from finviz.screener import Screener
stock_list = Screener(filters=[], table='Ownership')
print(len(stock_list[::])) # this used to print 7500, now it prints 680 only

However since 1-2 days ago, I'm only getting a subset of tickers back. I checked which tickers are coming back, and there's certain letters of the alpha bet excluded (e.g. any ticker starting with T).

If we save it to CSV, the same thing.. Only ~700 rows get saved:

stock_list.to_csv("stock.csv") # only 680 rows, any stocks starting with T are missing.

Even when getting only the sp500, it returns only 100 tickers:

stock_list = Screener(filters=['idx_sp500'], table='Ownership')
len(stock_list[::]) # returns 100

Thanks.

mariostoev commented 4 years ago

@somguyth Are you sure it broke exactly 2 days ago?

I'm executing the same command and I'm getting 720 results and the length of the table still says 7529. When I print the table in the console I see that the Screener skipped a few pages from the start and then it started skipping a lot (600-1320, 1440-6401).

For the S&P500 I'm getting 180 results, which is still wrong.

Interesting what causes the error and why we are getting different results. Every time I run the command I get a different number of rows. This means it has something to do with internet connection and the website limiting the ongoing connections. There's probably an error but it is silenced somewhere in the http connections code. Do you want to try to diagnose the error?

somguyth commented 4 years ago

@somguyth Are you sure it broke exactly 2 days ago?

I'm executing the same command and I'm getting 720 results and the length of the table still says 7529. When I print the table in the console I see that the Screener skipped a few pages from the start and then it started skipping a lot (600-1320, 1440-6401).

For the S&P500 I'm getting 180 results, which is still wrong.

Interesting what causes the error and why we are getting different results. Every time I run the command I get a different number of rows. This means it has something to do with internet connection and the website limiting the ongoing connections. There's probably an error but it is silenced somewhere in the http connections code. Do you want to try to diagnose the error?

I'm confident that it's a new issue that started 1-2 days ago. I won't be able to fix it unfortunately, I am overloaded & am not a good coder anyway

ormi81 commented 4 years ago

Response in function __http_request__async: 'Too many requests.'

Finviz must be limiting connections more aggressively now. I quickly tested adding a 10 ms delay and repeat if response length is 18, works OK.

somguyth commented 4 years ago

Response in function __http_request__async: 'Too many requests.'

Finviz must be limiting connections more aggressively now. I quickly tested adding a 10 ms delay and repeat if response length is 18, works OK.

Does that fix the issue I mentioned? You are getting the entire universe of stocks back from the Screener instead of a small subset of them?

How can I implement your solution? I've only run Screener once today (not multiple times) and it's still returning only a subset of the universe instead of the full universe.

ormi81 commented 4 years ago

Yes, I'll get all stocks back from the Screener. Not necessarily the most elegant solution but.. Locate request_functions.py in your Python installation folder and put session.get inside the while-loop.

async def __http_request__async(self, url, session):
    """ Sends asynchronous http request to URL address and scrapes the webpage. """
    try:
        while True:
            async with session.get(url, headers={'User-Agent': generate_user_agent()}) as response:
                page_html = await response.read()

            if page_html.startswith(b'Too many requests'):
                time.sleep(0.01)
                continue
            break

        if self.cssselect is True:
            return self.scrape_function(html.fromstring(page_html), url=url, *self.arguments)
        else:
            return self.scrape_function(page_html, url=url, *self.arguments)

    except (asyncio.TimeoutError, requests.exceptions.Timeout):
        raise ConnectionTimeout(url)

Remember to import time module in the beginning of the file for time.sleep if you deem it's needed; seems to work without it too but didn't test this many times.

somguyth commented 4 years ago

Yes, I'll get all stocks back from the Screener. Not necessarily the most elegant solution but.. Locate request_functions.py in your Python installation folder and put session.get inside the while-loop.

async def __http_request__async(self, url, session):
    """ Sends asynchronous http request to URL address and scrapes the webpage. """
    try:
        while True:
            async with session.get(url, headers={'User-Agent': generate_user_agent()}) as response:
                page_html = await response.read()

            if page_html.startswith(b'Too many requests'):
                time.sleep(0.01)
                continue
            break

        if self.cssselect is True:
            return self.scrape_function(html.fromstring(page_html), url=url, *self.arguments)
        else:
            return self.scrape_function(page_html, url=url, *self.arguments)

    except (asyncio.TimeoutError, requests.exceptions.Timeout):
        raise ConnectionTimeout(url)

Remember to import time module in the beginning of the file for time.sleep if you deem it's needed; seems to work without it too but didn't test this many times.

Can someone push this to the repo, I could try but will screw it up

d3an commented 4 years ago

Just a heads up, if you're trying to pull data for the entire stock universe (~7500 tickers), you shouldn't be using the Ownership (130) view. With the Ownership view you can return at most (assuming you have *Elite) 100 tickers per request. Otherwise its 20 per request for the free version. Instead, you should be scraping one of the 410, 510, or 520 views.

Each of these views returns a maximum of 1000 tickers per request. For the 500-series views, you'll need to do breadth-first scraping/sorting if you want to maintain order.

andr3w321 commented 4 years ago

I ended up writing my own finviz screener scraper python package available here https://github.com/andr3w321/finvizlite