Open jyounus opened 8 years ago
This isn't an exhaustive list of all bad strings, it's a list of examples.
What you do is you test your code with each of these to try to shake out bugs and, once you identify the bugs, you write a properly comprehensive fix.
(eg. If Æ
was included in the list as an example of a unicode character and your program didn't support Unicode, Checking .contains()
for Æ
wouldn't solve your problem for other 128,000+ unicode characters. The proper solution would depend on how you were using the text.)
Yeah, that's what I was thinking with the example you provided and looking for something more generic.
I'm sure this is a huge and common thing that needs to be implemented in different systems, aren't there like well established libraries available that help you out with this sort of thing? Maybe I'm thinking of something different here (input sanitisation? input validation?)
The BLNS is for testing your input sanitization.
Unfortunately, I don't code web apps in NodeJS (Python, PHP, and various other languages), so I can't suggest an input sanitization library/framework off the top of my head.
This list is useful for automated testing, not for runtime input validation/sanitization. Here's an example: https://github.com/parshap/node-sanitize-filename/blob/ef1e8ad58e95eb90f8a01f209edf55cd4176e9c8/test.js#L259-L262
One of the issues with automating this kind of thing is that it lets the number of test cases explode.
Let's say you have a hundred bad strings, that's a hundred test cases, right? Well, no... the only way to make sure that your input doesn't break anything is to make sure all inputs on all parameters for a form/API call are tested.
That means your number of test cases is (number of bad strings)^(number of string parameters) for each such form/API call.
Very few people take the time to test through that, even if the test cases can be generated automatically from some kind of spec.
That said, yes, it would be nice to see something like this, right?
I woner how to protect myself from the human injection
@Euphe Careful UI design. It really depends on the specific case.
Testing frameworks such as PHPUnit can use this list as a "data provider". Here's some code for PHPUnit:
/**
* @return array
*/
public function naughtyStringProvider()
{
$path = realpath(__DIR__ . '/../resources/tests/blns.base64.json');
$content = file_get_contents($path);
$array = json_decode($content);
$return = [];
foreach ($array as $entry) {
$return[] = [base64_decode($entry)];
}
return $return;
}
When you have a specific function that should accept user input and not break somehow, you can do this (again, in PHPUnit):
/**
* @covers \FireflyIII\Http\Controllers\Transaction\SingleController::store
* @dataProvider naughtyStringProvider
*/
public function testStoreNaughty(string $description)
{
// ...
}
This test is called 400+ times with a different string from the naughty list, automatically.
It is worth to know however, that this specific test (depending on how you set it up) would only test if your application accepts these strings. Which it might as well do, because many of the strings in the naughty list aren't very naughty per se, they're just inconvenient to read. If a user wants to give a description that's emoticons only, wel sure. That's not a problem per se.
My test case is just an example to show you how you could use this list. It's by no means the only way.
I suggest including the type of string (reason for error) beside each value
@JC5
You'd be surprised. For example, Fanfiction.net's got this stupid overzealous string sanitization which silently strips all percent signs from input, so a chapter containing "I'm 100% woman" would become "I'm 100 woman" without a single warning.
The punctuation used in "plaintext" emotes has its own scunthorpe problem.
Now that we have a list of potential naughty strings, what do you do with it to protect yourself properly? What's the best way to use this list to protect your backend where user input is required?
I'll be using this list in a nodejs based app with a mongo db, is it "enough" to test if the user input .contains() a line from the list? What would be a better way to protect yourself, rather than just looping over the list and checking for something that's equal (or at least contains in the user input)?