opws / opws-dataset

Profiles for the user account systems of various sites.
Open Data Commons Open Database License v1.0
14 stars 2 forks source link

Twitter password blacklist has duplicate entries #211

Closed stuartpb closed 6 years ago

stuartpb commented 7 years ago

As Reddit points out, some items on the list have a space after them. I need to do a .map(s=>s==s.trim()?s:JSON.stringify(s)) step before joining the array.

stuartpb commented 7 years ago

No, that's not it- as the deleted reply to that comment probably pointed out, the source list just straight-up has duplicates.

Now I'm wondering if I should enforce uniqueness in the schema or not.

Anyhow, I should add a ROT13 to the comment there, since I pulled this list out of the debugger rather than the attribute, thus skipping the ROT13 that the code does for you:

      this.undoBannedPasswordROT13 = function() {
        for (var t = [], e = 0, i = this.attr.bannedPasswords.length; e < i; e++)
          t.push(this.attr.bannedPasswords[e].replace(/[a-z]/gi, function(t) {
            var e = t.charCodeAt(0)
              , i = e + 13;
            return (e <= 90 && i > 90 || i > 122) && (i -= 26),
            String.fromCharCode(i)
          }));
        this.attr.bannedPasswords = t
      }
stuartpb commented 7 years ago

Or .replace(/[a-z]/gi,c=>String.fromCharCode((c=c.charCodeAt())+((c&95)>77?-13:13))) for short (credit).

stuartpb commented 7 years ago

Seeing as how it's a comment, .map(rot13) should do fine.

stuartpb commented 7 years ago

While we're talking about reviewing blacklist dictionary entries, both this and the Verizon Wireless profile have lists that were just assembled with a straight .join('\n- '), and as such, entries consisting only of digits (or other YAML-special sequences) aren't parsed as strings.

They should be re-parsed, stripped for duplicate entries, and dumped back out with yaml.dump.

stuartpb commented 6 years ago

The stringification issue was fixed in 251ca9c.