kokke / tiny-regex-c

Small portable regex in C
The Unlicense
1.23k stars 174 forks source link

Memory Corruption Vulnerabilities #30

Closed neilsikka closed 4 years ago

neilsikka commented 5 years ago

This project has memory safety vulnerabilities due to off-by-1 pointer management. For example, running: "tiny-regex-c.exe [" produces the following output: type: CHAR_CLASS [] type: CHAR '²' type: CHAR '²' type: CHAR '²' type: CHAR '²' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á' type: CHAR 'á'

This test case is reading past the end of the character buffer until the next NULL terminator is found. I identified issues in both re_print and re_compile in the above testcase. Vulnerabilities such as this can allow attackers to exploit systems or projects that embed this code.

kokke commented 5 years ago

Hi @neilsikka - yes indeed, if you provide an incomplete regex-pattern ('tiny-regex-c.exe [' is an unmatched character class if I am not mistaken), memory corruption will ensue.

If you are concerned about security problems, this library is not for you.

If you want a small implementation of a regex library, with bugs etc. as described in the README, the project may be of interest.

EDIT: I think this has been fixed in #44. I am unsure how to reproduce though, so will not re-open. I still think you should consider other libraries for serious business, if you can. Much was sacrificed for minimalism in tiny-regex, and I think e.g. PCRE is both faster and more thoroughly tested than this library could ever be.

snej commented 3 years ago

If you’re not going to fix this, the responsible thing would be to put a big boldface disclaimer in the README saying that “this library does not check patterns for validity, and may produce undefined behavior including memory corruption when given an invalid pattern.

Otherwise users will naturally assume the library rejects invalid patterns (or at worst produces invalid output without undefined side effects), since that’s how every other regex library I can think of works. And it would be all too easy to decide to use this in a situation where the patterns consist of untrusted input.

The term “Attractive Nuisance” comes to mind.

kokke commented 3 years ago

Hi @snej - is this still an issue?

From this thread: https://github.com/kokke/tiny-regex-c/issues/44 and the discussion, I got the impression that these issues were squashed.

I am not near a computer now, so can’t test myself.

If this is indeed still an issue, I will add a warning about incorrect usage as you suggest.

kokke commented 3 years ago

@snej To summarize and give you some context: #44 and the changes that ensued (see https://github.com/kokke/tiny-regex-c/commit/1a279e04014b70b0695fba559a7c05d55e6ee90b), made the library pass formal verification by KLEE.

After those changes, I removed the warning in the readme about not validating erroneous regex patterns.

I will appreciate a piece of code or a command so I can reproduce the alledged problem in a case such as this. Or even better (if possible): a PR fixing the problem.

kokke commented 3 years ago

@snej I had the chance to run KLEE on the code and (according to the tool) the code is proven free of run-time errors. So as far as I can tell, the bugs were indeed squashed in #44.

To be completely frank with you: I think you should have verified this for yourself, before chiming in on a closed issue from 2019.

Another approach would have been to just ask if this was fixed, before lecturing me on responsibility and "attractive nuisances". I did not appreciate that last comment and I found it accusing.

I am sure you had the best intentions, but in my opinion your approach was ham-fisted.

No harm, no foul: Thanks for your interest in this project.

Best wishes and I hope you will enjoy the weekend :)

snej commented 3 years ago

My apologies — someone linked to this issue in a forum, and I stupidly didn't notice how old the timestamp was. 😔 Glad it's fixed.

kokke commented 3 years ago

Hi @snej - thanks for being so upfront and no worries at all :)

Having now seen the lobste.rs-post (mentioned in #57) I totally understand how that situation arose. I don't come back to edit all issues normally, so I can easily understand how it wasn't easy to spot how this had been fixed.

I can see now, that my comment(s) were too harsh and dismissive - I apologize for that. I was in a rush when replying (chronic condition?) ... I have edited the original post to substantiate my comment a bit.

I am really surprised and happy to see my code mentioned on lobste.rs - but also a bit frustrated since I could not reply in the thread :laughing:

snej commented 3 years ago

If you'd like an invite, email me (jens åt mooseyard døt cøm) ^_^

kokke commented 3 years ago
       _=_
     q(-_-)p
     ,<___/\
    (<_`. _/)
   (_\_\_|_/_)
   BUDDHA BLESS

Thank you very much @snej - a good sport and a chummy lad etc. ;D

kokke commented 3 years ago

BTW I hear you like capybaras, so here's one for you, free of charge

                                 .;o,
        __."iIoi,._              ;pI __-"-oO.,_
      `.3"P3PPPoie-,.            .d' `;.     `p;
     `O"dP"````""`PdEe._       .;'   .     `  `|   
    "$#"'            ``"P4rdddsP'  .F.    ` `` ;  
   i/"""     *"Sp.               .dPff.  _.,;Gw'
   ;l"'     "  `dp..            "sWf;fe|'
  `l;          .rPi .    . "" "dW;;doe;
   $          .;PE`'       " "sW;.d.d;
   $$        .$"`     `"saed;lW;.d.d.i
   .$M       ;              ``  ' ld;.p.
__ _`$o,.-__  "ei-Mu~,.__ ___ `_-dee3'o-ii~m. ____