nsapa / fanfictionnet_ff_proxy

fanfictionnet_ff_proxy: an experimental "proxy" for fanfiction.net piloted by FanFicFare
CeCILL Free Software License Agreement v2.1
23 stars 3 forks source link

Never ending loop in the proxy #13

Open mcepl opened 1 year ago

mcepl commented 1 year ago

Not sure whether this is a duplicate of #12 , but when pull in 8aeabc4 the situation became substantially worse. Whereas before (for example with 9879f07ef293010f7f2ff8e232c94393880236a6) fanficfare just once crashed and next time it worked fine, now the proxy gets into some kind of neverending loop and the result is crash of fanficfare:

fun~/K/f/t/austen$ fanficfare https://www.fanfiction.net/s/14191300
Traceback (most recent call last):
  File "/home/matej/.bin/fanficfare", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/cli.py", line 344, in main
    dispatch(options, urls, passed_defaultsini, passed_personalini, warn, fail)
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/cli.py", line 320, in dispatch
    do_download(url,
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/cli.py", line 435, in do_download
    adapter.getStoryMetadataOnly()
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/adapters/base_adapter.py", line 327, in getStoryMetadataOnly
    self.doExtractChapterUrlsAndMetadata(get_cover=get_cover)
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/adapters/adapter_fanfictionnet.py", line 113, in doExtractChapterUrlsAndMetadata
    data = self.get_request(url)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/requestable.py", line 119, in get_request
    return self.get_request_redirected(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/requestable.py", line 111, in get_request_redirected
    (data,rurl) = self.configuration.get_fetcher().get_request_redirected(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/base_fetcher.py", line 133, in get_request_redirected
    fetchresp = self.do_request('GET',
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/decorators.py", line 68, in fetcher_do_request
    fetchresp = chainfn(
                ^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/cache_basic.py", line 122, in fetcher_do_request
    fetchresp = chainfn(
                ^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/decorators.py", line 102, in fetcher_do_request
    fetchresp = chainfn(
                ^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/base_fetcher.py", line 106, in do_request
    fetchresp = self.request(method,url,
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/matej/.local/lib/python3.11/site-packages/fanficfare/fetchers/fetcher_nsapa_proxy.py", line 202, in request
    raise exceptions.FailedToDownload(
fanficfare.exceptions.FailedToDownload: nsapa_proxy: reply still truncated after 5 retry
fun~/K/f/t/austen$ 

When running the proxy as python3 chrome_content.py --verbose --write-log --log-filename /tmp/fanfiction_proxy_log.txt this is fanfiction_proxy_log.txt.

nsapa commented 1 year ago

Your proxy is never able to pass the Cloudfare challenge.:

$ grep "page title" fanfiction_proxy_log.txt  |grep -v "chrome://version"
2023-06-28 20:10:15.193 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:10:26.383 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:10:36.125 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:10:45.684 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:10:55.230 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:11:08.291 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:11:31.001 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:11:40.360 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:11:49.722 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:11:59.132 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html
2023-06-28 20:12:09.477 UTC INFO get_content Current URL = https://www.fanfiction.net/s/14191300, page title = Just a moment..., mimetype = text/html

And it look like the detector doesn't work because I only see:

2023-06-28 20:10:15.196 UTC DEBUG selenium.webdriver.remote.remote_connection Finished Request
2023-06-28 20:10:15.199 UTC WARNING main Exception NotImplementedError in the main loop (No usable implementation found!)

That's very unexpected.

I tried the specified fiction with:

2023-06-28 22:35:20.847 CEST INFO ProxiedBrowser(init) chromedriver version 114.0.5735.90 (386bc09e8f4f2e025eddae123f36f6263096ae49-refs/branch-heads/5735@{#1052}) running as pid 24558 driving Chrome version 114.0.5735.133 running as pid 24557

And it worked:

FFF: DEBUG: 2023-06-28 22:42:49,568: cli.py(63): Successfully wrote 'A Marriage of True Minds-ffnet_14191300.epub'

Are you running undetected_chromedriver 3.5.0?

mcepl commented 1 year ago

Are you running undetected_chromedriver 3.5.0?

Isn’t it something your scripts downloads as it needs to?

Also, why 9879f07 works with the identical Chromium (ungoogled-chromium, 114.0.5735.106, from Flatpak)?

Redevil387 commented 1 year ago

I've updated to the current version of your poxy and encountered a similar problem as above. I updated undetected_chromedriver and all other dependencies including Selenium 4.10.0 but for some reason I'm not having issues opening chrome_content.py ? I dlick on it and it briefly shows the cmd window before crashing, meaning Chromium doesn't open. I recall having this issue when I first started using the proxy but can't recall how I fixed it. All requirements are otherwise satisfied.

Edit: Managed to solve issue by running with Python and not Python 3.

However, when updating/downloading stories the "NoneType' object has no attribute 'get_tex" issue is recieved.

mcepl commented 1 year ago

With c753ff7 I get fanfiction_proxy_log.txt

rymuller95 commented 1 year ago

I am experiencing the looping as well. Running on a mac if that makes any differences...

selenium - 4.10.0 undetected_chromedriver - 3.5.0

fanfiction_proxy_log.txt

nsapa commented 1 year ago

Does the latest commit help?

rymuller95 commented 1 year ago

Unfortunately not. Still looping with the verification box.

fanfiction_proxy_log.txt

nsapa commented 1 year ago
2023-07-15 11:15:01.264 PDT ERROR main Failed to notify user, title:Captcha detected by fanfictionnet_ff_proxy, message:Please complete the captcha in Chrome then press Enter in the python console
2023-07-15 11:15:01.264 PDT INFO cloudfare_clickcaptcha Waiting for user to resolve the captcha: press Enter to continue
2023-07-15 11:15:26.786 PDT INFO unix_exit_handler Got Interrupt: 2, telling the main loop to exit...

That's a different issue (#12), you get the Cloudfare' anti-bot. Current workaround is to open Chrome' Dev Tools to pass the check.

rymuller95 commented 1 year ago

Opening up dev tools with the latest commit seems to work as expected. Thanks!

Redevil387 commented 1 year ago

Opening Chrome Dev Tools worked for me as well but I still get this error:

'NoneType' object has no attribute 'get_text' https://www.fanfiction.net/s/8918264/1/A-Certain-Unknown-Level-0

Any idea as to the cause/solution?

rymuller95 commented 1 year ago

Seems the issue has reappeared even with the developer tools open. Keeps cycling the cloudflare check box..

fanfiction_proxy_log.txt