Open d3banjan opened 3 years ago
Looks like the existing proxy providers (premproxy, FreeProxy, SslProxy) changed some html attributes and we cannot parse them anymore -- thus the ProxyListException: list is empty
This will require some parser rework
For those who are interested, removing the attrs
filter works as a short term workaround,
This is because FreeProxy hasn't changed anything and find
gets the first table.
@d3banjan it not worked to me. Anyone did can to resolve "list empty"?
making
table = soup.find("table", attrs={"id": "proxylisttable"})
table = soup.find("table", attrs={"class": "table table-striped table-bordered"})
works for FreeProxy.
Had same problem, @metegenez solution worked for me
@metegenez solution works for FreeProxy and SslProxy
For PremProxy, it looks like they setup CORS and user agent checks, you can bypass it by spoofing the origin and referer headers, and leveraging a random user agent
In UnPacker.py make the following changes
+ def __init__(self, js_file_url, headers=None):
logger.info("JS UnPacker init path: {}".format(js_file_url))
+ r = requests.get(js_file_url, headers=headers)
encrypted = r.text.strip()
encrypted = "(" + encrypted.split("}(")[1][:-1]
unpacked = eval(
"self.unpack" + encrypted
) # string of the js code in unpacked form
matches = re.findall(r".*?\('\.([a-zA-Z0-9]{1,6})'\).*?\((\d+)\)", unpacked)
self.ports = dict((key, port) for key, port in matches)
logger.debug("portmap: " + str(self.ports))
In PremProxyParse.py make the following changes
import logging
import requests
from bs4 import BeautifulSoup
from http_request_randomizer.requests.parsers.js.UnPacker import JsUnPacker
from http_request_randomizer.requests.parsers.UrlParser import UrlParser
from http_request_randomizer.requests.proxy.ProxyObject import ProxyObject, AnonymityLevel, Protocol
+ from http_request_randomizer.requests.useragent.userAgent import UserAgentManager
def __init__(self, id, web_url, timeout=None):
self.base_url = web_url
web_url += "/list/"
# Ports decoded by the JS unpacker
self.js_unpacker = None
+ self.useragent = UserAgentManager()
+ self.headers = {
+ "User-Agent": self.useragent.get_random_user_agent(),
+ "Origin": self.base_url,
+ "Referer": self.base_url
+ }
UrlParser.__init__(self, id=id, web_url=web_url, timeout=timeout)
def parse_proxyList(self):
curr_proxy_list = []
try:
# Parse all proxy pages -> format: /list/{num}.htm
# Get the pageRange from the 'pagination' table
page_set = self.get_pagination_set()
logger.debug("Pages: {}".format(page_set))
# One JS unpacker per provider (not per page)
self.js_unpacker = self.init_js_unpacker()
for page in page_set:
+ response = requests.get("{0}{1}".format(self.get_url(), page), timeout=self.timeout, headers=self.headers)
if not response.ok:
def get_pagination_set(self):
+ response = requests.get(self.get_url(), timeout=self.timeout, headers=self.headers)
page_set = set()
# Could not parse pagination page - Let user know
if not response.ok:
logger.warning("Proxy Provider url failed: {}".format(self.get_url())
return page_set
def init_js_unpacker(self):
+ response = requests.get(self.get_url(), timeout=self.timeout, headers=self.headers)
# Could not parse provider page - Let user know
if not response.ok:
logger.warning("Proxy Provider url failed: {}".format(self.get_url()))
return None
content = response.content
soup = BeautifulSoup(content, "html.parser")
# js file contains the values for the ports
for script in soup.findAll('script'):
if '/js/' in script.get('src'):
jsUrl = self.base_url + script.get('src')
+ return JsUnPacker(jsUrl, headers=self.headers)
return None
2021-12-01 03:06:24,255 root DEBUG === Initialized Proxy Parsers ===
2021-12-01 03:06:24,256 root DEBUG FreeProxy parser of 'http://free-proxy-list.net' with required bandwidth: '150' KBs
2021-12-01 03:06:24,256 root DEBUG PremProxy parser of 'https://premproxy.com/list/' with required bandwidth: '150' KBs
2021-12-01 03:06:24,257 root DEBUG SslProxy parser of 'https://www.sslproxies.org' with required bandwidth: '150' KBs
2021-12-01 03:06:24,257 root DEBUG =================================
2021-12-01 03:06:24,548 root DEBUG Added 299 proxies from FreeProxy
2021-12-01 03:06:29,969 vendor.http_request_randomizer.requests.parsers.PremProxyParser DEBUG Pages: {'', '09.htm', '02.htm', '11.htm', '07.htm', '08.htm', '10.htm', '03.htm', '04.htm', '06.htm', '05.htm'}
2021-12-01 03:06:35,722 vendor.http_request_randomizer.requests.parsers.js.UnPacker INFO JS UnPacker init path: https://premproxy.com/js/e5f85.js
2021-12-01 03:06:57,201 vendor.http_request_randomizer.requests.parsers.js.UnPacker DEBUG portmap: {'ref78': '8080', 'r1d7a': '26691', 'rf4c5': '8000', 'r2cc7': '8088', 'rdd07': '80', 'r9d12': '8085', 'rd6c3': '22800', 'r8f5c': '3128', 'r0e76': '8118', 'r2d80': '53281', 'r331b': '21231', 'r65df': '35709', 'r83db': '8081', 'rcf74': '61047', 'r02fd': '30950', 'r149e': '57797', 'r00bd': '808', 'rbf12': '999', 'r7a2c': '33630', 'ra7fd': '9292', 'rec6b': '45006', 'r243e': '55443', 'rb9b8': '60731', 'r945a': '37475', 'rc0dc': '36984', 'reb86': '63141', 'r07aa': '21776', 'rb92e': '9300', 'r5b2a': '10248', 'rfc6b': '3228', 're041': '8888', 'r132e': '9991', 'rae2e': '9797', 'rcd20': '9080', 'rcb08': '9090', 'r93e2': '32191', 'rdc5c': '40387', 'r0136': '41258', 'r9450': '53959', 'r3ce8': '9999', 'r2fd8': '46611', 'r4c7b': '6969', 'r49f7': '3127', 'r8e32': '1976', 'r5f53': '1981', 'r5ff7': '38525', 'r3db9': '59394', 'r96f1': '42119', 'r377e': '443', 'r2676': '5566', 're051': '8008', 'r4126': '53410', 'r0d46': '8380', 'r09c5': '8197', 'r3e49': '40014', 'r32b7': '83', 'rf98a': '44047', 'rf392': '40390', 'r37ea': '8083', 'r0b6d': '2021', 'rad3d': '8181', 'r5204': '53128', 'r29f8': '6565', 'r4736': '54190', 'r8784': '31475', 'r123b': '8060', 'r9bfb': '8010', 'r9cc8': '12345', 'r5754': '5678', 'rc0a8': '45944', 'rc09f': '10040', 'r0099': '31409', 'rd8ea': '9091', 're258': '8090', 'r8b1a': '60792', 'rc949': '56145', 'rf203': '56644', 'ref86': '47615', 'ra341': '41917', 'r42f1': '46669', 'r63be': '54018', 'r8f71': '38970', 'rb4ac': '52271', 'r8f6e': '43631', 'r58e5': '8020', 'rd542': '54555', 'r5403': '23500', 'rbe81': '39272', 'r2c90': '53805', 'r7f29': '42033', 'r01a0': '10101', 'r4eeb': '47548', 'rf12d': '58136', 're30c': '34808', 'r8d68': '10000', 'r6b7c': '39617', 'ra500': '6000', 'r361b': '49044', 'r7406': '1337', 'rf53a': '52342', 'r615f': '57367', 'r2414': '50330', 'r15c0': '8082', 'r2920': '50782', 'rd979': '1256', 'r50ce': '59083', 'r352a': '37444', 'r8c95': '56975', 'r09fc': '61711', 'r61ce': '32161', 'radb3': '8998', 'r7dbb': '60808', 'r3a72': '45381', 'rdbee': '49285', 'rd78c': '10001', 'r858e': '3161', 'rbc7a': '43403', 'rc941': '32018', 'rbda4': '44380', 'r789e': '60000', 'radd4': '43496'}
2021-12-01 03:07:57,774 root DEBUG Added 521 proxies from PremProxy
2021-12-01 03:07:57,932 root DEBUG Added 99 proxies from SslProxy
2021-12-01 03:07:57,933 root DEBUG Total proxies = 919
2021-12-01 03:07:57,933 root DEBUG Filtered proxies = 919
2021-12-01 03:07:57,934 root DEBUG Initialization took: 93.67911338806152 sec
2021-12-01 03:07:57,934 root DEBUG Size: 919
Proof https://replit.com/@d3banjan/KhakiStarchyCustomer#main.py
The following code fails ---
... with the following error message --