Closed ghost closed 1 year ago
Hi there,
It seemed like the program fake_user_agent
didn't parse out any user agents that are needed in the cambridge
for scraping websites. It probably was caused by either your network or the website at that particular moment. I just uninstalled and installed cambridge
and fake_user_agent
to try to reproduce it, but everything works fine and can't get what you have.
Would you please try again? or do a little test like this to see if you can get a user agent now:
kate@gentoo ~ $ fakeua -v
fake_user_agent 2.1.7
kate@gentoo ~ $ fakeua --help
usage: fakeua [-h] [-n] [-d] [-v] [-r] [browser]
fakeua is a tool to generate a fake useragent randomly.
positional arguments:
browser supported values: chrome, edge, firefox, safari, opera. Case insensitive
options:
-h, --help show this help message and exit
-n, --nocache get a useragent without local caching
-d, --debug get a useragent in debug mode
-v, --version print the current version of the program
-r, --remove remove the cache file
kate@gentoo ~ $ fakeua --debug
2022-12-13 10:24:19 DEBUG user_agent.py A browser will be randowly given
2022-12-13 10:24:19 DEBUG user_agent.py Got chrome
2022-12-13 10:24:19 DEBUG settings.py Got cache folder: /home/kate/.cache/fakeua
2022-12-13 10:24:19 DEBUG settings.py fake_useragent_2.1.7.json is found.
2022-12-13 10:24:19 DEBUG user_agent.py Read /home/kate/.cache/fakeua/fake_useragent_2.1.7.json successfully
Mozilla/5.0 (X11; Ubuntu; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2919.83 Safari/537.36
Time taken: 0.0059 seconds
kate@gentoo ~ $
If you can get something like this, I think camb remit
would be no problem. Thanks.
I cannot get fake_user_agent
to work:
~
❯ fakeua --debug --nocache
2022-12-13 07:15:39 DEBUG user_agent.py A browser will be randowly given
2022-12-13 07:15:39 DEBUG user_agent.py Got chrome
2022-12-13 07:15:44 DEBUG user_agent.py http://useragentstring.com/pages/useragentstring.php?name=chrome has been fetched successfully
2022-12-13 07:15:44 DEBUG user_agent.py PARSING HTML from http://useragentstring.com/pages/useragentstring.php?name=chrome 1 times
2022-12-13 07:15:44 DEBUG user_agent.py PARSING HTML from http://useragentstring.com/pages/useragentstring.php?name=chrome 2 times
2022-12-13 07:15:44 DEBUG user_agent.py PARSING HTML from http://useragentstring.com/pages/useragentstring.php?name=chrome 3 times
2022-12-13 07:15:44 DEBUG user_agent.py Maximum PARSING retries reached. Exit
2022-12-13 07:15:44 ERROR user_agent.py Nothing parsed out
~ took 5s
❯ fakeua --version
fake_user_agent 2.1.7
Does it work for you if you do fakeua --remove
?
Sorry for that. You can try http://useragentstring.com/pages/useragentstring.php?name=chrome in your browser to see what it renders. Seems like the html file returned is not what we intended to be, so data cannot be parsed out.
You can also try with --remove option to remove the cache if any (but i don't think there will be any cache because the data wasn't parsed out to be saved as cache), and then try fakeua -d again.
Does that URL return for you what it is supposed to return? (you posted the same comment 3 times)
I wrote the last comment in my phone, which has no proxy, so visiting github is not very smooth in China because of gov censorship, you know.
The url should return a web page on your browser like this:
chrome User Agent Strings
ChromeChrome
Free open-source web browser developed by [Google](http://www.google.com/). Chromium is the name of the open source project behind [Google](http://www.google.com/) [Chrome](http://www.google.com/chrome), released under the BSD license.
Click on any string to get more details
Chrome 104.0.5112.79
[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36](https://useragentstring.com/Chrome104.0.5112.79_id_19986.php)
Chrome 104.0.0.0
[Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36](https://useragentstring.com/Chrome104.0.0.0_id_19988.php)
[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36](https://useragentstring.com/Chrome104.0.0.0_id_19989.php)
Chrome 103.0.5060.53
[Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36](https://useragentstring.com/Chrome103.0.5060.53_id_19987.php)
Chrome 99.0.4844.84
[Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36](https://useragentstring.com/Chrome99.0.4844.84_id_19983.php)
That's what I get:
this website is off now, which never happened before. I could visit it a few minutes ago. We should try later, or if you need, I can send you the cache for you to place in the right folder.
Thank you very much. I'm in no hurry, but in general it would be good to have a working version of the cache as not to break other softwares like cambridge :)
You reminds me that I could place the cache in a free host and update it periodically in case the useragentstring.com is off for new users. Thanks and sorry for the inconvenience, I will let you know when it works.
You reminds me that I could place the cache in a free host and update it periodically in case the useragentstring.com is off for new users. Thanks and sorry for the inconvenience, I will let you know when it works.
isn't a text file? why not directly on github?
just a json file. It would be ok on github, and it could be in the fake_user_agent
repo. It's just that I have to remember updating the file in the repo once in a while:) I think I can do that.
can a git hook update it for you?
I've never used that. Will do some research on how to use it. Thank you for your tip.
Hi, the useragentstring.com for scraping user agents has come back online, you can try fakeua --debug
now, or camb remit
again. If the problem still exists, please let me know.
By the way, fake_user_agent
has been upgraded to v2.1.8, which added support to read backup data from the repo. Thanks!
Yes it works now. Thanks! :)
I have latest pypi version 3.5.9: