toshipiazza / drtaint

Very WIP taint analysis for DynamoRIO (ARM)
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Soundness Issues with syscalls #6

Closed toshipiazza closed 6 years ago

toshipiazza commented 6 years ago

Syscalls such as read, recv, uname, fstat{64,}, getdents{64,}, etc all write to a passed-in buffer. If we don't explicitly set taint over these buffers after the syscalls succeeds, we could have a buffer that is marked tainted but in reality isn't.

Should we handle this in drtaint proper?

Syscalls such as these currently cause a lot of coreutils to fail, for example ls -alh.

vanhauser-thc commented 6 years ago

yes you should as otherwise the tainter is worthless. read, recv, recvfrom are the most important syscalls to hook for tainting. And it makes sense to hook open and accept4 as well, as e.g. if you only want to taint network traffic that can be received with read() too, so you would need to track read() for the file descriptor that came with accept4().

have a look at this project which is based on Pin: https://github.com/BinaryAnalysisPlatform/bap-pintraces

it is not a lot of work.

toshipiazza commented 6 years ago

Thanks for the suggestions!

read, recv, recvfrom are the most important syscalls to hook for tainting.

Agreed, we implement at least these and a few others, but not enough that our analysis is sound just yet (i.e. coreutils binaries routinely yield false positives).

And it makes sense to hook open and accept4 as well, as e.g. if you only want to taint network traffic that can be received with read() too, so you would need to track read() for the file descriptor that came with accept4().

Maybe we're not exactly on the same page--this technique seems to be aimed at tainting user input (possibly for the purposes of checking for control-flow infractions as in TaintCheck and TaintTrace). However, DrTaint should be a general library that allows users to taint what they want, for example at a syscall event.

The reason I call our analysis "unsound" is due to syscalls which overwrite some userspace buffer, for example uname. If we don't clear this taint (or subsequently set it, as a user might be interested in the information produced by uname for whatever reason) then the taint value of whatever is currently on the stack is preserved, which shouldn't happen, despite it having been overwritten.

vanhauser-thc commented 6 years ago

for very selfish reasons I just would like this tainter to be as flexible as possible and applicable to real-world problems :) (and then add myself Intel x86 and x64 to it)

the many syscalls are an issue as it is a lot of effort to implement them, true.

toshipiazza commented 6 years ago

Just a heads up @vanhauser-thc , but potentially around April I plan on contributing the DrTaint library in some form to the main DrMemory project under the name DrTracker, XREF https://github.com/DynamoRIO/drmemory/issues/825.

Derek has expressed interest in also splitting out the taint analysis code used for uninitialized read tracking for DrMemory (x86 only) and contributing it to DrTracker, if you are interested :)