Closed pdehaan closed 6 years ago
Interesting! I managed to not even notice the hibp 'pastes' before. Was there a decision at some point to not cover/address 'pastes'? I watched a few of the user tests yesterday and at least one of the testers said they were confused about some of the breaches associated with their email -- I wonder if perhaps we are capturing pastes and calling them breaches??
Either way, not sure what is causing the discrepancy in reported breaches/pastes. Will do some digging.
curiouser and curiouser...
Missing from LocalHost when scanning 'test@example.com':
Also of note, the order is in some cases totally different.
I'm not proud of this, but it may work as a starting point:
const got = require("got");
const { JSDOM } = require("jsdom");
async function hibp() {
const dom = await JSDOM.fromURL("https://haveibeenpwned.com/account/test@example.com", {});
const images = dom.window.document.querySelectorAll("div.pwnedWebsite .pwnLogo");
return Array.from(images).map(img => img.src.replace(/.*\/(.*?)\.(svg|png)$/i, "$1"));
}
async function firefoxMonitor() {
const options = {
form: true,
body: {emailHash:"567159D622FFBB50B11B0EFD307BE358624A26EE"}
};
const res = await got.post("https://monitor.firefox.com/scan", options);
const dom = new JSDOM(res.body.toString());
const images = dom.window.document.querySelectorAll(".image-wrap img");
return Array.from(images).map(img => img.src.replace(/^img\/logos\/(.*?)\.(svg|png)$/i, "$1"));
}
async function main() {
const _hibp = await hibp();
const _monitor = await firefoxMonitor();
const hibpNotMonitor = _hibp.filter(name => !_monitor.includes(name));
const monitorNotHIBP = _monitor.filter(name => !_hibp.includes(name));
console.log("HIBP, but not Monitor:", hibpNotMonitor.join(", "));
console.log("Monitor, but not HIBP:", monitorNotHIBP.join(", "));
}
main();
$ node index
HIBP, but not Monitor: Gaadi, Yatra
Monitor, but not HIBP: AshleyMadison, Badoo, Fling, FreedomHostingII, JustDate, Mate1, TheFappening, VTightGel, Zoosk
But per our Slack conversations this morning, the "Monitor, but not HIBP" list above, all seems to correspond with the "IsSensitive": true
results...
$ curl https://haveibeenpwned.com/api/v2/breaches | jq '.[] | select(.IsSensitive==true) | .Title'
"Adult Friend Finder"
"Ashley Madison"
"Badoo"
"Beautiful People"
"Bestialitysextaboo"
"Brazzers"
"CrimeAgency vBulletin Hacks"
"Eroticy"
"Fling"
"Florida Virtual School"
"Freedom Hosting II"
"Fridae"
"Fur Affinity"
"HongFire"
"Justdate.com"
"Mate1.com"
"Muslim Match"
"Naughty America"
"Non Nude Girls"
"Rosebutt Board"
"The Candid Board"
"The Fappening"
"V-Tight Gel"
"xHamster"
"YouPorn"
"Zoosk"
Am I caught up that this was caused by the sensitive breaches & spam lists? If so, that was fixed in https://github.com/mozilla/blurts-server/pull/235 right?
Pretty sure this is now understood and we are deliberate in the way we filter breaches now. I'd close this, but maybe we should wait for @lesleyjanenorton or @pdehaan to confirm.
Yeah, we're now deliberate in showing different results versus the HIBP site, since we're not showing the unverified breaches (see https://github.com/mozilla/blurts-server/pull/235#issuecomment-406045403 for more context).
I think we're OK to close this issue, unless somebody still has specific concerns.
It looks like if I Ctrl+F the https://haveibeenpwned.com/account/test@example.com page [for "Compromised data:"], I get "61 breached sites" (and 48 found pastes).
If I scan the monitor.firefox.com site for test@example.com, and search for "Compromised data:", I get 68 results (which is 7 more than the HIBP site).