mhwgoo / cambridge

Terminal version of Cambridge Dictionary by default. Also supports Merrian-Webster Dictionary.
GNU General Public License v3.0
51 stars 6 forks source link

Doesn't work #4

Closed ghost closed 1 year ago

ghost commented 1 year ago

I have latest pypi version 3.5.9:

❯ camb remit
2022-12-12 22:45:49 ERROR user_agent.py[70] Nothing parsed out
2022-12-12 22:45:49 ERROR base_events.py[1747] Task exception was never retrieved
future: <Task finished name='Task-1' coro=<main() done, defined at /home/USERNAME/.local/lib/python3.10/site-packages/fake_user_agent/user_agent.py:186> exception=SystemExit()>
Traceback (most recent call last):
  File "/home/USERNAME/.local/lib/python3.10/site-packages/cambridge/main.py", line 17, in main
    args.func(args, con, cur)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/cambridge/args.py", line 243, in search_word
    cambridge.search_cambridge(con, cur, input_word, is_fresh, is_ch)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/cambridge/dicts/cambridge.py", line 40, in search_cambridge
    fresh_run(con, cur, req_url, input_word, is_ch)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/cambridge/dicts/cambridge.py", line 75, in fresh_run
    result = fetch_cambridge(req_url, input_word, is_ch)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/cambridge/dicts/cambridge.py", line 50, in fetch_cambridge
    res = dict.fetch(req_url, session)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/cambridge/dicts/dict.py", line 20, in fetch
    ua = user_agent()
  File "/home/USERNAME/.local/lib/python3.10/site-packages/fake_user_agent/user_agent.py", line 250, in user_agent
    return asyncio.run(main(browser, use_cache))
  File "/usr/lib/python3.10/asyncio/runners.py", line 47, in run
    _cancel_all_tasks(loop)
  File "/usr/lib/python3.10/asyncio/runners.py", line 63, in _cancel_all_tasks
    loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/fake_user_agent/user_agent.py", line 211, in main
    await asyncio.gather(*tasks)
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/fake_user_agent/user_agent.py", line 103, in write_to_dict
    versions = await parse(browser, session)
  File "/home/USERNAME/.local/lib/python3.10/site-packages/fake_user_agent/user_agent.py", line 92, in parse
    attempt = call_on_error(ValueError("Nothing parsed out"), url, attempt, OP[1])
  File "/home/USERNAME/.local/lib/python3.10/site-packages/fake_user_agent/user_agent.py", line 71, in call_on_error
    sys.exit()
SystemExit
mhwgoo commented 1 year ago

Hi there,

It seemed like the program fake_user_agent didn't parse out any user agents that are needed in the cambridge for scraping websites. It probably was caused by either your network or the website at that particular moment. I just uninstalled and installed cambridgeand fake_user_agent to try to reproduce it, but everything works fine and can't get what you have.

Would you please try again? or do a little test like this to see if you can get a user agent now:

kate@gentoo ~ $ fakeua -v
fake_user_agent 2.1.7
kate@gentoo ~ $ fakeua --help
usage: fakeua [-h] [-n] [-d] [-v] [-r] [browser]

fakeua is a tool to generate a fake useragent randomly.

positional arguments:
  browser        supported values: chrome, edge, firefox, safari, opera. Case insensitive

options:
  -h, --help     show this help message and exit
  -n, --nocache  get a useragent without local caching
  -d, --debug    get a useragent in debug mode
  -v, --version  print the current version of the program
  -r, --remove   remove the cache file
kate@gentoo ~ $ fakeua --debug
2022-12-13 10:24:19 DEBUG user_agent.py A browser will be randowly given
2022-12-13 10:24:19 DEBUG user_agent.py Got chrome
2022-12-13 10:24:19 DEBUG settings.py Got cache folder: /home/kate/.cache/fakeua
2022-12-13 10:24:19 DEBUG settings.py fake_useragent_2.1.7.json is found.
2022-12-13 10:24:19 DEBUG user_agent.py Read /home/kate/.cache/fakeua/fake_useragent_2.1.7.json successfully

Mozilla/5.0 (X11; Ubuntu; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2919.83 Safari/537.36

Time taken: 0.0059 seconds
kate@gentoo ~ $ 

If you can get something like this, I think camb remit would be no problem. Thanks.

ghost commented 1 year ago

I cannot get fake_user_agent to work:

~ 
❯ fakeua --debug --nocache
2022-12-13 07:15:39 DEBUG user_agent.py A browser will be randowly given
2022-12-13 07:15:39 DEBUG user_agent.py Got chrome
2022-12-13 07:15:44 DEBUG user_agent.py http://useragentstring.com/pages/useragentstring.php?name=chrome has been fetched successfully
2022-12-13 07:15:44 DEBUG user_agent.py PARSING HTML from http://useragentstring.com/pages/useragentstring.php?name=chrome 1 times
2022-12-13 07:15:44 DEBUG user_agent.py PARSING HTML from http://useragentstring.com/pages/useragentstring.php?name=chrome 2 times
2022-12-13 07:15:44 DEBUG user_agent.py PARSING HTML from http://useragentstring.com/pages/useragentstring.php?name=chrome 3 times
2022-12-13 07:15:44 DEBUG user_agent.py Maximum PARSING retries reached. Exit
2022-12-13 07:15:44 ERROR user_agent.py Nothing parsed out

~ took 5s 
❯ fakeua --version
fake_user_agent 2.1.7
ghost commented 1 year ago

Does it work for you if you do fakeua --remove?

mhwgoo commented 1 year ago

Sorry for that. You can try http://useragentstring.com/pages/useragentstring.php?name=chrome in your browser to see what it renders. Seems like the html file returned is not what we intended to be, so data cannot be parsed out.

You can also try with --remove option to remove the cache if any (but i don't think there will be any cache because the data wasn't parsed out to be saved as cache), and then try fakeua -d again.

ghost commented 1 year ago

Does that URL return for you what it is supposed to return? (you posted the same comment 3 times)

mhwgoo commented 1 year ago

I wrote the last comment in my phone, which has no proxy, so visiting github is not very smooth in China because of gov censorship, you know.

mhwgoo commented 1 year ago

The url should return a web page on your browser like this:

chrome User Agent Strings
ChromeChrome
Free open-source web browser developed by [Google](http://www.google.com/). Chromium is the name of the open source project behind [Google](http://www.google.com/) [Chrome](http://www.google.com/chrome), released under the BSD license.

Click on any string to get more details
Chrome 104.0.5112.79
[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36](https://useragentstring.com/Chrome104.0.5112.79_id_19986.php)
Chrome 104.0.0.0
[Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36](https://useragentstring.com/Chrome104.0.0.0_id_19988.php)
[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36](https://useragentstring.com/Chrome104.0.0.0_id_19989.php)
Chrome 103.0.5060.53
[Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36](https://useragentstring.com/Chrome103.0.5060.53_id_19987.php)
Chrome 99.0.4844.84
[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36](https://useragentstring.com/Chrome99.0.4844.84_id_19983.php)
ghost commented 1 year ago

That's what I get: image

mhwgoo commented 1 year ago

this website is off now, which never happened before. I could visit it a few minutes ago. We should try later, or if you need, I can send you the cache for you to place in the right folder.

ghost commented 1 year ago

Thank you very much. I'm in no hurry, but in general it would be good to have a working version of the cache as not to break other softwares like cambridge :)

mhwgoo commented 1 year ago

You reminds me that I could place the cache in a free host and update it periodically in case the useragentstring.com is off for new users. Thanks and sorry for the inconvenience, I will let you know when it works.

ghost commented 1 year ago

You reminds me that I could place the cache in a free host and update it periodically in case the useragentstring.com is off for new users. Thanks and sorry for the inconvenience, I will let you know when it works.

isn't a text file? why not directly on github?

mhwgoo commented 1 year ago

just a json file. It would be ok on github, and it could be in the fake_user_agent repo. It's just that I have to remember updating the file in the repo once in a while:) I think I can do that.

ghost commented 1 year ago

can a git hook update it for you?

mhwgoo commented 1 year ago

I've never used that. Will do some research on how to use it. Thank you for your tip.

mhwgoo commented 1 year ago

Hi, the useragentstring.com for scraping user agents has come back online, you can try fakeua --debug now, or camb remit again. If the problem still exists, please let me know.

By the way, fake_user_agent has been upgraded to v2.1.8, which added support to read backup data from the repo. Thanks!

ghost commented 1 year ago

Yes it works now. Thanks! :)