Open mrdavidlaing opened 10 years ago
A very basic json
filter gives the following:
'@type': googlebot-maxcdn
'@message': '{"bytes":0,"client_asn":"AS16509 Amazon.com, Inc.","client_city":"-","client_continent":"EU","client_country":"IE","client_dma":"0","client_ip":"54.247.60.162","client_latitude":53,"client_longitude":-8,"client_state":"-","company_id":85,"cache_status":"MISS","hostname":"cdn.yoast.com","method":"HEAD","origin_time":0.471,"pop":"lhr","protocol":"HTTP\/1.1","query_string":"","referer":"-","scheme":"https","status":200,"time":"2014-07-01T05:10:50.388Z","uri":"\/wp-content\/uploads\/2007\/12\/blogmetrics02.png","user_agent":"Googlebot\/2.1
(+http:\/\/www.google.com\/bot.html)","zone_id":33008}'
'@version': '1'
'@timestamp': 2014-07-01 06:10:50.388000000 +01:00
bytes: 0
client_asn: AS16509 Amazon.com, Inc.
client_city: '-'
client_continent: EU
client_country: IE
client_dma: '0'
client_ip: 54.247.60.162
client_latitude: 53
client_longitude: -8
client_state: '-'
company_id: 85
cache_status: MISS
hostname: cdn.yoast.com
method: HEAD
origin_time: 0.471
pop: lhr
protocol: HTTP/1.1
query_string: ''
referer: '-'
scheme: https
status: 200
time: '2014-07-01T05:10:50.388Z'
uri: /wp-content/uploads/2007/12/blogmetrics02.png
user_agent: Googlebot/2.1 (+http://www.google.com/bot.html)
zone_id: 33008
Compared to @type:googlebot
which has the following shape:
'@type': googlebot
'@message': '{ "content_type": "text/xml; charset=UTF-8", "@timestamp": "2014-06-19T21:54:20-07:00",
"remote_addr": "66.249.69.45", "body_bytes_sent": 38704, "request_time": 1.539,
"status": 200, "robots": "noindex,follow", "redirect_location": "-", "request_method":
"GET", "scheme": "https", "server_name": "yoast.com", "request_uri": "/cat/wordpress/feed/",
"document_uri": "/index.php", "http_user_agent": "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)" }'
'@version': '1'
'@timestamp': 2014-06-20 04:54:20.000000000 Z
content_type:
charset: utf-8
type: text/xml
remote_addr: 66.249.69.45
body_bytes_sent: 38704
request_time: 1.539
status: 200
robots: noindex,follow
redirect_location: '-'
request_method: GET
scheme: https
server_name: yoast.com
request_uri: /cat/wordpress/feed/
document_uri: /index.php
http_user_agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
remote_addr_dns: crawl-66-249-69-45.googlebot.com
I think we should rename the @type:googlebot-maxcdn
fields to match those of @type:googlebot
@jdevalk - do you agree?
Its possible to get logs of googlebot traffic to MaxCDN via the MaxCDN api. This gives source logs in the following format:
These should be parsed into a format that makes analysing them easy