rejoiceinhope / scrapy-proxy-pool

164 stars 33 forks source link

IndexError: list index out of range #2

Open protonhs opened 5 years ago

protonhs commented 5 years ago

Hello, I'm encountering this error lately:

File "/Users/mac/PycharmProjects/scrapy/venv/lib/python3.7/site-packages/proxyscrape/scrapers.py", line 164, in get_proxy_daily_http_proxies return _get_proxy_daily_proxies_parse_inner(centers[0], 'http', 'proxy-daily-http') IndexError: list index out of range

any help please?

sriramkumar1996 commented 5 years ago

In /Users/mac/PycharmProjects/scrapy/venv/lib/python3.7/site-packages/proxyscrape/scrapers.py, change the below functions:

def _get_proxy_daily_proxies_parse_inner(element, type, source):
    content = element.text
    rows = content.replace('"', '').replace("'", '').split('\n')
    proxies = set()
    for row in rows:
        row = row.strip()
        if len(row) == 0:
            continue

        params = row.split(':')
        params.extend([None, None, None, type, source])
        proxies.add(Proxy(*params))
    return proxies
def get_proxy_daily_http_proxies():
    url = 'http://www.proxy-daily.com'
    response = requests.get(url)
    if not response.ok:
        raise RequestNotOKError()

    try:
        soup = BeautifulSoup(response.content, 'html.parser')
        content = soup.find('div', {'id': 'free-proxy-list'})
        centers = content.find_all('div', {'class': 'centeredProxyList freeProxyStyle'})
        return _get_proxy_daily_proxies_parse_inner(centers[0], 'http', 'proxy-daily-http')
    except (AttributeError, KeyError):
        raise InvalidHTMLError()
nealonhager commented 5 years ago

@sriramkumar1996 that didn't work for me. However, in the same file if you go to RESOURCE_TYPE_MAP and comment out 'proxy-daily-http' that allows the whole program to work again.

DenLakusta commented 5 years ago

I had the same problem. After replacing functions, I have had another error. I'm not sure it will help you but in scrapers.py lib requests are not imported. import request works for me.

longgangsima commented 5 years ago

with change

In /Users/mac/PycharmProjects/scrapy/venv/lib/python3.7/site-packages/proxyscrape/scrapers.py, change the below functions:

def _get_proxy_daily_proxies_parse_inner(element, type, source):
    content = element.text
    rows = content.replace('"', '').replace("'", '').split('\n')
    proxies = set()
    for row in rows:
        row = row.strip()
        if len(row) == 0:
            continue

        params = row.split(':')
        params.extend([None, None, None, type, source])
        proxies.add(Proxy(*params))
    return proxies
def get_proxy_daily_http_proxies():
    url = 'http://www.proxy-daily.com'
    response = requests.get(url)
    if not response.ok:
        raise RequestNotOKError()

    try:
        soup = BeautifulSoup(response.content, 'html.parser')
        content = soup.find('div', {'id': 'free-proxy-list'})
        centers = content.find_all('div', {'class': 'centeredProxyList freeProxyStyle'})
        return _get_proxy_daily_proxies_parse_inner(centers[0], 'http', 'proxy-daily-http')
    except (AttributeError, KeyError):
        raise InvalidHTMLError()

this won't work, only comment out 'PROXY_POOL_ENABLED = True', then it has no meaning to use proxy pool. anyone had solves this issues?

miguelomp commented 4 years ago

I install this repo from pip install git+ and looking for what dependencies it installed, there was proxyscrape23 but in other project I had this repo work so I checked the dependencies in this other project and was installed proxyscrape

I changed proxyscrape23 to proxyscrape and now works