hernanponcedeleon / Dat3M

A verification tool for many memory models
MIT License
74 stars 27 forks source link

Performance of alias analysis #369

Open hernanponcedeleon opened 1 year ago

hernanponcedeleon commented 1 year ago

Our alias analysis is very slow when run on this program.

These are the times I get (running this requires GlobalSettings.ARCH_PRECISION = 64)

[22.01.2023] 16:43:34 [INFO] AliasAnalysis.fromConfig - Selected Alias Analysis: FIELD_SENSITIVE
[22.01.2023] 16:45:37 [INFO] AliasAnalysis.fromConfig - Done
ThomasHaas commented 1 year ago

The C-code of the benchmark contains bit-fiddling on pointers, which our alias analysis cannot handle. Here is an example:

static inline struct task_struct *rt_mutex_owner(struct rt_mutex_base *lock)
{
    unsigned long owner = (unsigned long) READ_ONCE(lock->owner);

    return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS); // Bit-fiddling on pointer
}

Our alias analysis will assume that loads/stores involving the returned pointer can address anything, due to the bit-fiddling on pointers.

hernanponcedeleon commented 1 year ago

I understand this will have an effect on the precision of the analysis, but is this also related to why running the analysis is so slow?

ThomasHaas commented 1 year ago

Let me correct what I said. While the bit-fiddling is problematic, the fact that we load a pointer from memory should already give us a pointer without any aliasing information, so the bit-fiddling doesn't make it worse, I guess.

I don't know how exactly the performance correlates to the precision, since it might also get faster due not being able to derive anything. However, this code is interesting for the alias analysis in general.

ThomasHaas commented 1 year ago

This benchmark has around 1200 memory addresses and I ran out of Java heap space while profiling, but I got some data from that. All time of the algorithm (up to the crash after around 3 minutes) is spent in the methods fields (71% of time) and addAllAddresses (29%). Both functions are easily improvable, and from simple rewriting I got it like 6 times faster already.

grafik
ThomasHaas commented 1 year ago

The last issue is just that we store a fresh HashSet per pointer in the program. Many pointers have the same potential set of addresses they can access, so they could share the same HashSet-instance for that. I think there are many ways to go about this, like e.g. canonicalizing every points-to-set or finding equivalence classes of pointers in a pre-analysis so that they can be "merged".

Edit: PR #374 addresses the performance loss in fields. The remaining bottleneck is addAllAddresses which is affected by the above mentioned problem.

hernanponcedeleon commented 1 year ago

@ThomasHaas you mentioned that you discussed (before the improvement that you already implemented) with Rene some potential improvements. Do those differ of what was already implemented? I.e. can we improve performance even further or should we close this issue?

ThomasHaas commented 1 year ago

What I discussed with René is orthogonal to what I improved so far. Overall there are more optimizable points, but I don't mind closing this issue if you think it is good enough for now.

hernanponcedeleon commented 1 year ago

Right now, it is less blocking than it used to be, but I would not complain if we can get the times faster. Since we already have ideas on how to improve this, let's keep this issue open and we get those done.

hernanponcedeleon commented 10 months ago

@ThomasHaas can you add to this issue the llvm file of tree-RCU so we have a concrete test showing the problem?

ThomasHaas commented 10 months ago

Here is a reduced version of the example: rcu-test-alias.ll.zip. The original takes over an hour and I hope that this one is a bit faster (but still slow). I'm also wondering if the analysis problem is related to poison addresses used in linked lists. Those are clearly problematic, but I'm not sure if the relevant code is ever executed.

Edit: The alias analysis on the reduced version seems to take around 30 minutes on my machine (bound=1).

hernanponcedeleon commented 3 weeks ago

The alias analysis did not finish overnight for benchmarks/folds/was.c.

@xeren do you have any idea why the performance is so bad in this example?

ThomasHaas commented 3 weeks ago

On what branch do you have that code?

hernanponcedeleon commented 3 weeks ago

Sorry (spell checker). This code https://github.com/hernanponcedeleon/Dat3M/blob/master/benchmarks/lfds/wsq.c

hernanponcedeleon commented 3 weeks ago

BTW ... both the field sensitive/insensitive analysis finish below 1 sec and seems to produce good results, e.g.

[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_SENSITIVE
[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.53 secs
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - 
        wmm.analysis.relationAnalysis: NATIVE
        wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:14:47 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 6.150 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.205 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - 
======== RelationAnalysis summary ======== 
        #Relations: 109
        #Axioms: 9
        #may-edges removed (extended): 67545
        #must-edges added (extended): 0
        total #must|may|exclusive edges: 1647667|2416852|0
        #must|may rf edges: 2|3710
        #must|may co edges: 2396|4728
===========================================
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_INSENSITIVE
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.24 secs
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - 
        wmm.analysis.relationAnalysis: NATIVE
        wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:16:43 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 5.965 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.622 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - 
======== RelationAnalysis summary ======== 
        #Relations: 109
        #Axioms: 9
        #may-edges removed (extended): 87917
        #must-edges added (extended): 0
        total #must|may|exclusive edges: 1647667|2522918|0
        #must|may rf edges: 2|4952
        #must|may co edges: 2396|6084
===========================================
ThomasHaas commented 3 weeks ago

Due to the recent update to mem2reg, we get a lot of warnings regarding pontentially uninitialized variables. My best bet is that the alias analysis has to be conservative on those (assuming non-det initial value) which causes it to generate lots of aliasing. Notice that newtasks is an array of pointers, so a non-det value allows for aliasing with every object. We need to incorporate reasoning about path conditions to prove that the values are initialized. (Won't work in this example).

EDIT: Actually, in this example proving that we do not access uninitialized registers seems to be very hard. In fact, I don't understand how this code is correct.

    if (count >= q.mask) {
        // == (count >= size-1)
        int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1));

        assert(newsize < q.MaxSize);

        Obj *newtasks[STATICSIZE]; // This gets mem2reg'd
        int i;
        for (i = 0; i < count; i++) {
            int temp = (h + i) & q.mask;
            newtasks[i] = q.elems[temp];
        }
        for (i = 0; i < newsize; i++) {
            q.elems[i] = newtasks[i]; // This is only initialized by the above loop if "newsize <= count".
        }

Consider the entry condition count >= q.mask with a count that is only slightly larger than the mask, and q.mask != 0. Then int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1)); easily leads to a value that is larger than count. This in turn really causes uninitialized values to get accessed. Isn't this UB?.

hernanponcedeleon commented 3 weeks ago

I agree it is not trivial to know if the code has UB (even with the patch below). However, the patch below fixes all the warnings and the alias analysis still seems to have problems

diff --git a/benchmarks/lfds/wsq.c b/benchmarks/lfds/wsq.c
index 2831dac3a..142e05ad0 100644
--- a/benchmarks/lfds/wsq.c
+++ b/benchmarks/lfds/wsq.c
@@ -14,7 +14,7 @@
 #define STEAL_ATTEMPS 1

 void *stealer(void *unused) {
-    Obj *r;
+    Obj *r = NULL;
     for (int i = 0; i < STEAL_ATTEMPS; i++) {
         if (steal(&r)) {
             operation(r);
@@ -41,14 +41,14 @@ int main(void) {
     for (int i = 0; i < ITEMS / 2; i++) {
         push(&items[2 * i]);
         push(&items[2 * i + 1]);
-        Obj *r;
+        Obj *r = NULL;
         if (pop(&r)) {
             operation(r);
         }
     }

     for (int i = 0; i < ITEMS / 2; i++) {
-        Obj *r;
+        Obj *r = NULL;
         if (pop(&r)) {
             operation(r);
         }
diff --git a/benchmarks/lfds/wsq.h b/benchmarks/lfds/wsq.h
index c992bf794..0e620b419 100644
--- a/benchmarks/lfds/wsq.h
+++ b/benchmarks/lfds/wsq.h
@@ -209,6 +209,9 @@ void syncPush(Obj* elem) {
             int temp = (h + i) & q.mask;
             newtasks[i] = q.elems[temp];
         }
+        for (i = count; i < STATICSIZE - count; i++) {
+            newtasks[i] = NULL;
+        }
         for (i = 0; i < newsize; i++) {
             q.elems[i] = newtasks[i];
         }
ThomasHaas commented 3 weeks ago

I haven't checked Dartagnan on your patch, but I would be surprised if the patch really changed much. Since the initializing loop has dynamic bounds (cause count is dynamic), I don't think that Dartagnan is able to figure out that newtasks is really initialized (this requires far more intricate reasoning).

xeren commented 3 weeks ago

The alias analysis did not finish overnight for benchmarks/folds/was.c.

@xeren do you have any idea why the performance is so bad in this example?

This program seems to contain negative weight cycles. This seems to put IBPA into an endless loop. The best fix appears to be setting TOP = List.of(-1) (what it actually should be in theory). Then it takes 0.5 secs on my machine. I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.

hernanponcedeleon commented 3 weeks ago

I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.

Does this means that "in theory" it is not the case that the default algorithm is always better than the others?

ThomasHaas commented 3 weeks ago

IIRC, the old algorithms were just unsound in those cases and that is why they were "more precise".

xeren commented 3 weeks ago

Exactly. In terms of precision, "in theory" cannot really apply here. In practice, the fixed default is sometimes less precise.

Version 1 (FIELD_INSENSITIVE) truncates out-of-bounds addresses every time, in order to avoid looping on non-zero-weight cycles.

Version 2 (FIELD_SENSITIVE) basically kept this issue, but also introduced a provenance assumption. Then there was a loss in precision in some litmus tests featuring pointer sets starting with the last field of an object (e.g. &A+12+4x on a 16-bytes object A). There was only one possible location, so this generated more MUST-alias relationships. To enforce more precision everywhere, I made the assumption, that x is non-negative.

In Version 3 (FULL), I replaced the out-of-bounds filter with a different handling of positive-weight cycles. But I kept the assumption from before, which lead to this issue.