secure-software-engineering / phasar

A LLVM-based static analysis framework.
Other
919 stars 140 forks source link

Filtered AliasSet #723

Open fabianbs96 opened 1 month ago

fabianbs96 commented 1 month ago

The LLVMAliasSet contains a lot of spurious aliases due to its context insensitivity and its almost not present field-sensitivity. Hence, the client analyses based on this alias information produce many false positives and need to propagate many more facts than necessary, resulting in a severe performance hit.

Therefore, some analyses in phasar, such as the IFDSTaintAnalysis and IDEExtendedTaintAnalysis, filter the alias sets on-the-fly to achieve better results.

This PR provides a wrapper FilteredLLVMAliasSet around the LLVMAliasSet that abstracts and caches the filtering.

fabianbs96 commented 1 month ago

Hi @vulder, thanks for your comments. To answer your questions:

Where do you implement the caching (I might have missed it 😆 )

The caching is implemented in getAliasSet(), but now I have also implemented it for getReachableAllocationSites()

and how exactly do you ensure that the cache is a) not to big and b) how do you do cache invalidation.

Currently, I have not implemented cache invalidation and just keep the cache growing. In my tests (on some coreutils), it has not been any problem. Also, for IFDS and IDE analyses, it is hard to predict, when an alias set is never used again.

One could implement some ref-counting or least-recently-used strategy, but I havent' implemented it (yet?)

vulder commented 1 month ago

Hi @vulder, thanks for your comments. To answer your questions:

Where do you implement the caching (I might have missed it 😆 )

The caching is implemented in getAliasSet(), but now I have also implemented it for getReachableAllocationSites()

and how exactly do you ensure that the cache is a) not to big and b) how do you do cache invalidation.

Currently, I have not implemented cache invalidation and just keep the cache growing. In my tests (on some coreutils), it has not been any problem. Also, for IFDS and IDE analyses, it is hard to predict, when an alias set is never used again.

One could implement some ref-counting or least-recently-used strategy, but I havent' implemented it (yet?)

Ok, just keep in mind invalidation could also be interesting when the cache grows to big. So, even when a set would be needed again afterwards it would be useful to remove it from the cache to reduce the memory footprint.