Closed ahmadia closed 8 years ago
So here's what's available from a call to "stats" from crawldb. I need to expose this in nutch-python, spin out a new build, then land a patch here.
{
"retry 0":"8350",
"minScore":"0.0",
"retry 1":"96",
"status":{
"3":{"count":"21","statusValue":"db_gone"},
"2":{"count":"594","statusValue":"db_fetched"},
"1":{"count":"7721","statusValue":"db_unfetched"},
"5":{"count":"86","statusValue":"db_redir_perm"},
"4":{"count":"24","statusValue":"db_redir_temp"}
},
"totalUrls":"8446",
"maxScore":"0.528",
"avgScore":"0.029593771"
}
Easy workaround in nutch-python for this so proceeding.
Needs a discussion with Brittain, I don't think this is hard but it is pretty useful. It would be good to chat with the Nutch folks and ask them what other kinds of things are available and make sense to put on the dashboard while we're tweaking it.