monperrus / crawler-user-agents

Syntactic patterns of HTTP user-agents used by bots / robots / crawlers / scrapers / spiders. pull-request welcome :star:
MIT License
1.17k stars 249 forks source link

added WordPress crawler #335

Closed petrokrupenia closed 11 months ago

monperrus commented 11 months ago

thanks, CI validation fails, how to fix it? Thanks!

petrokrupenia commented 11 months ago

Have no idea. Are there some other details?

monperrus commented 11 months ago

See https://app.travis-ci.com/github/monperrus/crawler-user-agents/builds/266792708


Traceback (most recent call last):

  File "/home/travis/build/monperrus/crawler-user-agents/validate.py", line 95, in <module>

    main()

  File "/home/travis/build/monperrus/crawler-user-agents/validate.py", line 55, in main

    raise ValueError('Pattern {!r} has an unescaped slash character'.format(pattern))

ValueError: Pattern 'WordPress/' has an unescaped slash character
petrokrupenia commented 11 months ago

added fix

monperrus commented 11 months ago

thanks!

"WordPress/X.X.X; https://example.com" is fake, do you have a real example? Which wordpress plugin sends this?

Thanks!

petrokrupenia commented 11 months ago

It is not a fake. WordPress has its own bot/crawler (not wordpress plugin).

Our plugin tracks user agents for statistics, and one of the users noticed that there were too many posts from UA with such parameters: WordPress/6.3. 1; https://usersdomain.com (can't add real domain, privacy, etc.)

Also i found this: https://useragents.io/explore/platforms/unknown/maker/wordpress-org-b87

monperrus commented 11 months ago

ack, thanks for the additional info

devicenull commented 6 months ago

Hi @petrokrupenia , can you provide any references or links about where WordPress has a crawler built in? I'm not able to find anything supporting this. (although there are a number of third party plugins that do it - but not the core WordPress)