Expected behavior w/ sites that have multiple breaches

pdehaan commented 6 years ago

STR:

Search https://haveibeenpwned.com/PwnedWebsites for "Bell (", and you should get 2 results:

Bell (2014 breach)

> In February 2014, Bell Canada suffered a data breach via the hacker collective known as NullCrew. The breach included data from multiple locations within Bell and exposed email addresses, usernames, user preferences and a number of unencrypted passwords and credit card data from 40,000 records containing just over 20,000 unique email addresses and usernames. > > **Breach date:** 1 February 2014 > **Date added to HIBP:** 1 February 2014 > **Compromised accounts:** 20,902 > **Compromised data:** Credit cards, Genders, Passwords, Usernames

Bell (2017 breach)

> In May 2017, the Bell telecommunications company in Canada suffered a data breach resulting in the exposure of millions of customer records. The data was consequently leaked online with a message from the attacker stating that they were "releasing a significant portion of Bell.ca's data due to the fact that they have failed to cooperate with us" and included a threat to leak more. The impacted data included over 2 million unique email addresses and 153k survey results dating back to 2011 and 2012. There were also 162 Bell employee records with more comprehensive personal data including names, phone numbers and plain text "passcodes". Bell suffered another breach in 2014 which exposed 40k records. > > **Breach date:** 15 May 2017 > **Date added to HIBP:** 16 May 2017 > **Compromised accounts:** 2,231,256 > **Compromised data:** Email addresses, Geographic locations, IP addresses, Job titles, Names, Passwords, Phone numbers, Spoken languages, Survey results, Usernames

Both seem to be for https://www.bell.ca/

Navigate to https://www.bell.ca/ and you get notified of the most recent breach (but not the earlier breach). Not sure if we need to build in some next/previous style navigation for sites w/ multiple breaches, or if that's just too awkward and confusing. Although the first breach only had 21k compromised accounts (versus the 2.2m compromised accounts in the 2017 breach), the first breach did include credit card numbers, so that may be valuable information to share with users. I only did a quick scan of breached domains, and I think that Bell.ca is the only one w/ multiple breaches.

bell-ca

pdehaan commented 6 years ago

Actually, I think I was wrong... it looks like http://www.r2games.com has had multiple (2) breaches as well. Which is slightly interesting, because it isn't showing me the most recent breach. So maybe we're just returning the first result from the breaches.json or something and these aren't sorted.

r2games-com

R2 (2017 forum breach)

In early 2017, the forum for the gaming website R2 Games was hacked. R2 had previously appeared on HIBP in 2015 after a prior incident. This one exposed over 1 million unique user accounts and corresponding MD5 password hashes with no salt.

Breach date: 1 January 2017 Date added to HIBP: 25 April 2017 Compromised accounts: 1,023,466 Compromised data: Email addresses, Passwords, Usernames, Website activity

R2Games

In late 2015, the gaming website R2Games was hacked and more than 2.1M personal records disclosed. The vBulletin forum included IP addresses and passwords stored as salted hashes using a weak implementation enabling many to be rapidly cracked. A further 11M accounts were added to "Have I been pwned" in March 2016 and another 9M in July 2016 bringing the total to over 22M.

Breach date: 1 November 2015 Date added to HIBP: 9 February 2016 Compromised accounts: 22,281,337 Compromised data: Email addresses, IP addresses, Passwords, Usernames

pdehaan commented 6 years ago

OK, I wrote a for reals parser which checks for domains w/ 1+ breaches:

$ node dupes

"" has 9 breaches.
bell.ca has 2 breaches.
forum.btcsec.com has 2 breaches.
r2games.com has 2 breaches.
data4marketers.com has 2 breaches.

$ cat dupes.js

const breaches = require("./breaches.json");

const breachMap = new Map();

breaches.forEach(breach => {
  if (!breachMap.has(breach.Domain)) {
    breachMap.set(breach.Domain, 1);
  } else {
    const domainCount = breachMap.get(breach.Domain);
    breachMap.set(breach.Domain, domainCount + 1);
  }
});

[...breachMap].filter(([domain, count]) => {
  return count > 1;
}).forEach(([domain, count]) => {
  console.log(`${domain} has ${count} breaches.`);
});

NOTE: Empty domain breaches filed as #50; "A few empty domains in breaches.json".

nhnt11 commented 6 years ago

UX suggests: When a domain has multiple associated breaches,

Use the Name of the first breach chronologically.
Get everything else from the last breach chronologically.

pdehaan commented 6 years ago

Use the Name of the first breach chronologically.

Get everything else from the last breach chronologically.

This feels weird to me, but I've been struggling to find a solution that works (apart from adding paging for the ~3 sites with 2+ breaches). We can't reliably sort by severity. Sorting by number of Pwned accounts seems a bit arbitrary... My gut almost just says keep it simple and always show the most recent breach's details (including Name).

Here's the current multi-breach status using the latest breach data from HIBP:

bell.ca

  Bell (2014 breach) (Bell)
  Breach domain: bell.ca
  Breach date: 2014-02-01
  Added date: 2014-02-01T23:57:10Z
  Pwn count: 20902
  Data Classes: Credit cards, Genders, Passwords, Usernames
  Description: In February 2014, <a href="http://news.softpedia.com/news/Hackers-Claim-to-Have-Breached-Bell-Canada-s-Systems-422952.shtml?utm_medium=twitter&utm_source=FredToadster" target="_blank" rel="noopener">Bell Canada suffered a data breach via the hacker collective known as NullCrew</a>. The breach included data from multiple locations within Bell and exposed email addresses, usernames, user preferences and a number of unencrypted passwords and credit card data from 40,000 records containing just over 20,000 unique email addresses and usernames.

  Bell (2017 breach) (Bell2017)
  Breach domain: bell.ca
  Breach date: 2017-05-15
  Added date: 2017-05-16T01:49:31Z
  Pwn count: 2231256
  Data Classes: Email addresses, Geographic locations, IP addresses, Job titles, Names, Passwords, Phone numbers, Spoken languages, Survey results, Usernames
  Description: In May 2017, <a href="http://www.cbc.ca/beta/news/technology/bell-data-breach-customer-names-phone-numbers-emails-leak-1.4116608" target="_blank" rel="noopener">the Bell telecommunications company in Canada suffered a data breach</a> resulting in the exposure of millions of customer records. The data was consequently leaked online with a message from the attacker stating that they were &quot;releasing a significant portion of Bell.ca's data due to the fact that they have failed to cooperate with us&quot; and included a threat to leak more. The impacted data included over 2 million unique email addresses and 153k survey results dating back to 2011 and 2012. There were also 162 Bell employee records with more comprehensive personal data including names, phone numbers and plain text &quot;passcodes&quot;. Bell suffered another breach in 2014 which exposed 40k records.

forum.btcsec.com

  Bitcoin Security Forum Gmail Dump (BTSec)
  Breach domain: forum.btcsec.com
  Breach date: 2014-01-09
  Added date: 2014-09-10T20:30:11Z
  Pwn count: 4789599
  Data Classes: Email addresses, Passwords
  Description: In September 2014, a large dump of nearly 5M usernames and passwords was <a href="https://forum.btcsec.com/index.php?/topic/9426-gmail-meniai-parol/" target="_blank" rel="noopener">posted to a Russian Bitcoin forum</a>. Whilst commonly reported as 5M &quot;Gmail passwords&quot;, the dump also contained 123k yandex.ru addresses. Whilst the origin of the breach remains unclear, the breached credentials were <a href="http://web.archive.org/web/20140910190920/http://www.reddit.com/r/netsec/comments/2fz13q/5_millions_of_gmail_passwords_leaked_rus_most/" target="_blank" rel="noopener">confirmed by multiple source as correct</a>, albeit a number of years old.

  Yandex Dump (Yandex)
  Breach domain: forum.btcsec.com
  Breach date: 2014-09-07
  Added date: 2014-09-12T04:50:32Z
  Pwn count: 1186564
  Data Classes: Email addresses, Passwords
  Description: In September 2014, <a href="http://habrahabr.ru/post/235949/" target="_blank" rel="noopener">news broke of a massive leak of accounts from Yandex</a>, the Russian search engine giants who also provides email services. The purported million &quot;breached&quot; accounts were disclosed at the same time as nearly 5M mail.ru accounts with <a href="http://globalvoicesonline.org/2014/09/10/russia-email-yandex-mailru-passwords-hacking/" target="_blank" rel="noopener">both companies claiming the credentials were acquired via phishing scams</a> rather than being obtained as a result of direct attacks against their services.

r2games.com

  R2 (2017 forum breach) (R2-2017)
  Breach domain: r2games.com
  Breach date: 2017-01-01
  Added date: 2017-04-25T11:04:29Z
  Pwn count: 1023466
  Data Classes: Email addresses, Passwords, Usernames, Website activity
  Description: In early 2017, the forum for the gaming website <a href="http://www.csoonline.com/article/3192246/security/r2games-compromised-again-over-one-million-accounts-exposed.html" target="_blank" rel="noopener">R2 Games was hacked</a>. R2 had previously appeared on HIBP in 2015 after a prior incident. This one exposed over 1 million unique user accounts and corresponding MD5 password hashes with no salt.

  R2Games (R2Games)
  Breach domain: r2games.com
  Breach date: 2015-11-01
  Added date: 2016-02-09T12:20:35Z
  Pwn count: 22281337
  Data Classes: Email addresses, IP addresses, Passwords, Usernames
  Description: In late 2015, the gaming website <a href="https://www.r2games.com" target="_blank" rel="noopener">R2Games</a> was hacked and more than 2.1M personal records disclosed. The vBulletin forum included IP addresses and passwords stored as salted hashes using a weak implementation enabling many to be rapidly cracked. A further 11M accounts were added to "Have I been pwned" in March 2016 and another 9M in July 2016 bringing the total to over 22M.

The problem is, in each case, the Title of the oldest breach, will be very confusing if we use the Description of the newest breach.

Bell (2014 breach) In May 2017, the Bell telecommunications company in Canada suffered a data breach resulting in the exposure of millions of customer records. The data was consequently leaked online with a message from the attacker stating that they were "releasing a significant portion of Bell.ca's data due to the fact that they have failed to cooperate with us" and included a threat to leak more. The impacted data included over 2 million unique email addresses and 153k survey results dating back to 2011 and 2012. There were also 162 Bell employee records with more comprehensive personal data including names, phone numbers and plain text "passcodes". Bell suffered another breach in 2014 which exposed 40k records.

And this could get weird, if a user clicks to the monitor.firefox.com site, and then either sees different titles or different details based on whether we redirect them to the oldest or newest breach.

multibreach.js

```js const breaches = require("./breaches.json"); const breachMap = monitorBreaches(breaches).reduce((map, breach) => { const arr = map.get(breach.Domain) || []; arr.push(breach); map.set(breach.Domain, arr); return map; }, new Map()); const multiBreaches = [...breachMap].filter(([domain, breachArr]) => domain && breachArr.length > 1); for (const [domain, aBreaches] of multiBreaches) { console.log(domain); aBreaches.sort((breachA, breachB) => { return breachB.BreachDate - breachA.BreachDate; }).forEach(breach => { console.log(` ${breach.Title} (${breach.Name}) Breach domain: ${breach.Domain} Breach date: ${breach.BreachDate} Added date: ${breach.AddedDate} Pwn count: ${breach.PwnCount} Data Classes: ${breach.DataClasses.join(", ")} Description: ${breach.Description} `); }); } function monitorBreaches(breaches) { return breaches.filter(breach => breach.IsVerified && !breach.IsRetired && !breach.IsSensitive && !breach.IsSpamList); } ```

nhnt11 commented 6 years ago

@pdehaan This is only for the doorhanger. We're not showing the breach description in the doorhanger anymore - only the Name (not the Title). The point of using the Name of the first breach chronologically was (for example) to make sure we use "Bell" vs "Bell2017". What do you think?

nhnt11 commented 6 years ago

Oops, didn't mean to close this.

pdehaan commented 6 years ago

Oh, interesting, OK... We aren't publicly displaying the breach.Name anywhere on the blurts server, only the breach.Title (the Name is only used for determining the logo image, or in the /?breach={Name} slug).

https://fx-breach-alerts.herokuapp.com/?breach=Bell https://fx-breach-alerts.herokuapp.com/?breach=Bell2017

So basically we're arguing if you want to display:

`.Name`	`.Title`
"Bell" or "Bell2017"	"Bell (2014 breach)" or "Bell (2017 breach)"
"BTSec" or "Yandex"	"Bitcoin Security Forum Gmail Dump" or "Yandex Dump"
"R2Games" or "R2-2017"	"R2Games" or "R2 (2017 forum breach)"

Yeah, sure. Personally I think the .Title is more user friendly and consistent w/ blurts-server, but this is currently a pretty rare problem to have multiple breaches. Probably not worth the effort/overhead of adding special "This site has had multiple breaches. The most recent breach was %Title%." UI.

If you want to display the .Name, I'd probably vote to use the original breach. If you want to display the .Title, I'd argue that displaying the most recent breach is more relevant.

nhnt11 commented 6 years ago

@pdehaan We need to use .Name over .Title in the doorhanger because of the string into which it gets formatted: "Xyz accounts from FooBar were compromised..."

It doesn't make sense to replace FooBar with "Yandex Dump" for example.

nhnt11 commented 6 years ago

@pdehaan++ Thanks for articulating all of this though, super useful for posterity.

mozilla / blurts-addon