monperrus / crawler-user-agents

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
MIT License
1.19k stars 254 forks source link

added WordPress crawler #335

Closed petrokrupenia closed 1 year ago

monperrus commented 1 year ago

thanks, CI validation fails, how to fix it? Thanks!

petrokrupenia commented 1 year ago

Have no idea. Are there some other details?

monperrus commented 1 year ago

See https://app.travis-ci.com/github/monperrus/crawler-user-agents/builds/266792708


Traceback (most recent call last):

  File "/home/travis/build/monperrus/crawler-user-agents/validate.py", line 95, in <module>

    main()

  File "/home/travis/build/monperrus/crawler-user-agents/validate.py", line 55, in main

    raise ValueError('Pattern {!r} has an unescaped slash character'.format(pattern))

ValueError: Pattern 'WordPress/' has an unescaped slash character
petrokrupenia commented 1 year ago

added fix

monperrus commented 1 year ago

thanks!

"WordPress/X.X.X; https://example.com" is fake, do you have a real example? Which wordpress plugin sends this?

Thanks!

petrokrupenia commented 1 year ago

It is not a fake. WordPress has its own bot/crawler (not wordpress plugin).

Our plugin tracks user agents for statistics, and one of the users noticed that there were too many posts from UA with such parameters: WordPress/6.3. 1; https://usersdomain.com (can't add real domain, privacy, etc.)

Also i found this: https://useragents.io/explore/platforms/unknown/maker/wordpress-org-b87

monperrus commented 1 year ago

ack, thanks for the additional info

devicenull commented 7 months ago

Hi @petrokrupenia , can you provide any references or links about where WordPress has a crawler built in? I'm not able to find anything supporting this. (although there are a number of third party plugins that do it - but not the core WordPress)