Closed wertercatt closed 1 year ago
Thanks, the regex in split_url
currently assumes a .com
TLD, so maybe it would make sense to grab the TLD from the URL string instead of hardcoding it like it's currently done in https://github.com/nrsyed/proboards-scraper/blob/main/proboards_scraper/scraper/utils.py#L44:
expr = r"(^.*\.com)(/.*)?$"
I am encountering the same error, how do I fix it? Thank you.
I've pushed a branch, dev/url_domain_fix
(#44), that should address the issue. Please test it and let me know if it works.
I've pushed a branch,
dev/url_domain_fix
(#44), that should address the issue. Please test it and let me know if it works.
Thanks. I'm now receiving the following error:
`[05:37:16][INFO][proboards_scraper.core] Logging in to https://kittenswork.boards.net
[1025/053726.514:INFO:CONSOLE(93)] "Uncaught ReferenceError: proboards is not defined", source: https://kittenswork.boards.net/ (93)
[1025/053726.555:INFO:CONSOLE(55)] "Uncaught ReferenceError: $ is not defined", source: https://kittenswork.boards.net/ (55)
[1025/053727.878:INFO:CONSOLE(0)] "Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'.", source: (0)
[1025/053728.146:INFO:CONSOLE(3)] "recaptchacompat disabled", source: https://cloudflare.hcaptcha.com/1/api.js?endpoint=https%3A%2F%2Fcloudflare.hcaptcha.com&assethost=https%3A%2F%2Fcf-assets.hcaptcha.com&imghost=https%3A%2F%2Fcf-imgs.hcaptcha.com&render=explicit&recaptchacompat=off&onload=_cf_chl_hload (3)
Traceback (most recent call last):
File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Python310\Scripts\pbs.exe__main__.py", line 7, in
C:\Users\jack_\proboards-scraper>[1025/053729.869:INFO:CONSOLE(3)] "Request for the Private Access Token challenge.", source: (3) [1025/053729.870:INFO:CONSOLE(3)] "The next request for the Private Access Token challenge may return a 401 and show a warning in console.", source: (3) [1025/053729.894:INFO:CONSOLE(3)] "console.groupEnd", source: (3)`
I'm sorry this is my first github issue and it's continuing further:
C:\Users\jack_\proboards-scraper>[1025/054120.126:INFO:CONSOLE(0)] "Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'.", source: (0) [1025/054120.269:INFO:CONSOLE(3)] "recaptchacompat disabled", source: https://cloudflare.hcaptcha.com/1/api.js?endpoint=https%3A%2F%2Fcloudflare.hcaptcha.com&assethost=https%3A%2F%2Fcf-assets.hcaptcha.com&imghost=https%3A%2F%2Fcf-imgs.hcaptcha.com&render=explicit&recaptchacompat=off&onload=_cf_chl_hload (3) [1025/054122.107:INFO:CONSOLE(3)] "Request for the Private Access Token challenge.", source: (3) [1025/054122.107:INFO:CONSOLE(3)] "The next request for the Private Access Token challenge may return a 401 and show a warning in console.", source: (3) [1025/054122.138:INFO:CONSOLE(3)] "console.groupEnd", source: (3) [1025/054123.588:INFO:CONSOLE(0)] "[.WebGL-000033280328A200]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels", source: https://cf-assets.hcaptcha.com/captcha/v1/1f7dc62/static/hcaptcha.html#frame=challenge&id=08788u458bne&host=login.proboards.com&sentry=true&reportapi=https%3A%2F%2Faccounts.hcaptcha.com&recaptchacompat=off&custom=false&endpoint=https%3A%2F%2Fcloudflare.hcaptcha.com&hl=en&assethost=https%3A%2F%2Fcf-assets.hcaptcha.com&imghost=https%3A%2F%2Fcf-imgs.hcaptcha.com&tplinks=on&sitekey=33f96e6a-38cd-421b-bb68-7806e1764460&theme=light (0) [1025/054123.594:INFO:CONSOLE(0)] "[.WebGL-00003328044F0000]GL Driver Message (OpenGL, Performance, GL_CLOSE_PATH_NV, High): GPU stall due to ReadPixels", source: https://cf-assets.hcaptcha.com/captcha/v1/1f7dc62/static/hcaptcha.html#frame=challenge&id=16qga3x0cfxk&host=login.proboards.com&sentry=true&reportapi=https%3A%2F%2Faccounts.hcaptcha.com&recaptchacompat=off&custom=false&endpoint=https%3A%2F%2Fcloudflare.hcaptcha.com&hl=en&assethost=https%3A%2F%2Fcf-assets.hcaptcha.com&imghost=https%3A%2F%2Fcf-imgs.hcaptcha.com&tplinks=on&sitekey=33f96e6a-38cd-421b-bb68-7806e1764460&theme=light (0)
Works for me if I don't try to log in. I think that's a separate issue though. Might need a way to import cookies?
Closing this issue, other person's problems aren't related.
[wertercatt@wertserv proboards-scraper]$ pbs https://letssosl.boards.net Traceback (most recent call last): File "/home/wertercatt/.local/bin/pbs", line 8, in
sys.exit(pbs_cli())
File "/home/wertercatt/.local/lib/python3.10/site-packages/proboards_scraper/main.py", line 115, in pbs_cli
proboards_scraper.run_scraper(
File "/home/wertercatt/.local/lib/python3.10/site-packages/proboards_scraper/core.py", line 102, in run_scraper
base_url, url_path = split_url(url)
File "/home/wertercatt/.local/lib/python3.10/site-packages/proboards_scraper/scraper/utils.py", line 46, in split_url
base_url, path = match.groups()
AttributeError: 'NoneType' object has no attribute 'groups'