Closed toshipiazza closed 6 years ago
yes you should as otherwise the tainter is worthless. read, recv, recvfrom are the most important syscalls to hook for tainting. And it makes sense to hook open and accept4 as well, as e.g. if you only want to taint network traffic that can be received with read() too, so you would need to track read() for the file descriptor that came with accept4().
have a look at this project which is based on Pin: https://github.com/BinaryAnalysisPlatform/bap-pintraces
it is not a lot of work.
Thanks for the suggestions!
read, recv, recvfrom are the most important syscalls to hook for tainting.
Agreed, we implement at least these and a few others, but not enough that our analysis is sound just yet (i.e. coreutils binaries routinely yield false positives).
And it makes sense to hook open and accept4 as well, as e.g. if you only want to taint network traffic that can be received with read() too, so you would need to track read() for the file descriptor that came with accept4().
Maybe we're not exactly on the same page--this technique seems to be aimed at tainting user input (possibly for the purposes of checking for control-flow infractions as in TaintCheck and TaintTrace). However, DrTaint should be a general library that allows users to taint what they want, for example at a syscall event.
The reason I call our analysis "unsound" is due to syscalls which overwrite some userspace buffer, for example uname. If we don't clear this taint (or subsequently set it, as a user might be interested in the information produced by uname for whatever reason) then the taint value of whatever is currently on the stack is preserved, which shouldn't happen, despite it having been overwritten.
for very selfish reasons I just would like this tainter to be as flexible as possible and applicable to real-world problems :) (and then add myself Intel x86 and x64 to it)
the many syscalls are an issue as it is a lot of effort to implement them, true.
Just a heads up @vanhauser-thc , but potentially around April I plan on contributing the DrTaint library in some form to the main DrMemory project under the name DrTracker, XREF https://github.com/DynamoRIO/drmemory/issues/825.
Derek has expressed interest in also splitting out the taint analysis code used for uninitialized read tracking for DrMemory (x86 only) and contributing it to DrTracker, if you are interested :)
Syscalls such as read, recv, uname, fstat{64,}, getdents{64,}, etc all write to a passed-in buffer. If we don't explicitly set taint over these buffers after the syscalls succeeds, we could have a buffer that is marked tainted but in reality isn't.
Should we handle this in drtaint proper?
Syscalls such as these currently cause a lot of coreutils to fail, for example ls -alh.