Closed neilsikka closed 4 years ago
Hi @neilsikka - yes indeed, if you provide an incomplete regex-pattern ('tiny-regex-c.exe [' is an unmatched character class if I am not mistaken), memory corruption will ensue.
If you are concerned about security problems, this library is not for you.
If you want a small implementation of a regex library, with bugs etc. as described in the README, the project may be of interest.
EDIT: I think this has been fixed in #44. I am unsure how to reproduce though, so will not re-open. I still think you should consider other libraries for serious business, if you can. Much was sacrificed for minimalism in tiny-regex, and I think e.g. PCRE is both faster and more thoroughly tested than this library could ever be.
If you’re not going to fix this, the responsible thing would be to put a big boldface disclaimer in the README saying that “this library does not check patterns for validity, and may produce undefined behavior including memory corruption when given an invalid pattern.”
Otherwise users will naturally assume the library rejects invalid patterns (or at worst produces invalid output without undefined side effects), since that’s how every other regex library I can think of works. And it would be all too easy to decide to use this in a situation where the patterns consist of untrusted input.
The term “Attractive Nuisance” comes to mind.
Hi @snej - is this still an issue?
From this thread: https://github.com/kokke/tiny-regex-c/issues/44 and the discussion, I got the impression that these issues were squashed.
I am not near a computer now, so can’t test myself.
If this is indeed still an issue, I will add a warning about incorrect usage as you suggest.
@snej To summarize and give you some context: #44 and the changes that ensued (see https://github.com/kokke/tiny-regex-c/commit/1a279e04014b70b0695fba559a7c05d55e6ee90b), made the library pass formal verification by KLEE.
After those changes, I removed the warning in the readme about not validating erroneous regex patterns.
I will appreciate a piece of code or a command so I can reproduce the alledged problem in a case such as this. Or even better (if possible): a PR fixing the problem.
@snej I had the chance to run KLEE on the code and (according to the tool) the code is proven free of run-time errors. So as far as I can tell, the bugs were indeed squashed in #44.
To be completely frank with you: I think you should have verified this for yourself, before chiming in on a closed issue from 2019.
Another approach would have been to just ask if this was fixed, before lecturing me on responsibility and "attractive nuisances". I did not appreciate that last comment and I found it accusing.
I am sure you had the best intentions, but in my opinion your approach was ham-fisted.
No harm, no foul: Thanks for your interest in this project.
Best wishes and I hope you will enjoy the weekend :)
My apologies — someone linked to this issue in a forum, and I stupidly didn't notice how old the timestamp was. 😔 Glad it's fixed.
Hi @snej - thanks for being so upfront and no worries at all :)
Having now seen the lobste.rs-post (mentioned in #57) I totally understand how that situation arose. I don't come back to edit all issues normally, so I can easily understand how it wasn't easy to spot how this had been fixed.
I can see now, that my comment(s) were too harsh and dismissive - I apologize for that. I was in a rush when replying (chronic condition?) ... I have edited the original post to substantiate my comment a bit.
I am really surprised and happy to see my code mentioned on lobste.rs - but also a bit frustrated since I could not reply in the thread :laughing:
If you'd like an invite, email me (jens åt mooseyard døt cøm) ^_^
_=_
q(-_-)p
,<___/\
(<_`. _/)
(_\_\_|_/_)
BUDDHA BLESS
Thank you very much @snej - a good sport and a chummy lad etc. ;D
BTW I hear you like capybaras, so here's one for you, free of charge
.;o,
__."iIoi,._ ;pI __-"-oO.,_
`.3"P3PPPoie-,. .d' `;. `p;
`O"dP"````""`PdEe._ .;' . ` `|
"$#"' ``"P4rdddsP' .F. ` `` ;
i/""" *"Sp. .dPff. _.,;Gw'
;l"' " `dp.. "sWf;fe|'
`l; .rPi . . "" "dW;;doe;
$ .;PE`' " "sW;.d.d;
$$ .$"` `"saed;lW;.d.d.i
.$M ; `` ' ld;.p.
__ _`$o,.-__ "ei-Mu~,.__ ___ `_-dee3'o-ii~m. ____
This project has memory safety vulnerabilities due to off-by-1 pointer management. For example, running: "tiny-regex-c.exe [" produces the following output: type: CHAR_CLASS [] type: CHAR '²' type: CHAR '²' type: CHAR '²' type: CHAR '²' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á'
This test case is reading past the end of the character buffer until the next NULL terminator is found. I identified issues in both re_print and re_compile in the above testcase. Vulnerabilities such as this can allow attackers to exploit systems or projects that embed this code.