Implement spinloop detection for ARM exclusive pairs

natgavrilenko commented 2 months ago

Consider the attached 'spinloop.litmus' test. The loop in thread P0 has no side effects, because this loop terminates whenever STLXR is executed. The value of W0 is 0 if the store was successful and 1 if the store failed. However, the test terminates with UNKNOWN result for any unrolling bound. The manually unrolled test 'spinloop-unrolled.litmus' works as expected.

The current implementation of spinloop detection fails to detect this spinloop. The reason is that dartagnan internally represent STLXR as a pair of instructions in the same branch, while spinloop detection essentially analyses the branches withing a loop.

Note: I would avoid changing the internal representation of ARM exclusive pair to contain branches. Representing such event pairs using branches would generate artificial control dependencies, which are used in the ARM consistency model. I would rather implement a special handling of RmwStoreExclusive events in the spinloop detection method.

spinloop.txt spinloop-unrolled.txt

ThomasHaas commented 2 months ago

Your code actually shows a problem in the parser for AARCH64. The registers W0 is parsed as X0, making the loop truly side-effectful as the address of the exclusive load (X0) gets (wrongly) updated by the status of the exclusive store (W0). That being said, we would still fail to detect the side-effectfree iteration where the store fails (but fixing this is easy, though insufficient due to the parser bug).

hernanponcedeleon commented 2 months ago

Register W0 is the lowest 32 bits of register X0 (which is 64 bits). The fact that the exclusive store partially updates the address of the load looks to me like a problem in the test per se.

I think @db7 generated this program from some real assembly. Diogo can you confirm if the assembly really uses W0/X0 for both the addresses and the execution status of the exclusive store, or this was a problem when translating from assembly to pasm with kittens?

ThomasHaas commented 2 months ago

Oh, I'm not familiar with the register names in assembly code, but that certainly sounds wrong.

db7 commented 2 months ago

@hernanponcedeleon Are you talking about this? This is perfectly fine and yes, that is the output of the compiler.

 STLXR   W0, X2, [X0]

According to the documentation of the STLXR instruction, the first argument is the status, not the loaded value. Address is in X0, it loads the value into X2 and W0 just keeps the success result.

ThomasHaas commented 2 months ago

Yes, but W0 is the lower part of X0, so the address gets updated if W0 is changed. This is surely wrong.

hernanponcedeleon commented 2 months ago

 LC00:
 LDAXR   X3, [X0]
 STLXR   W0, X2, [X0]
 CBZ     W0, LC01
 B       LC00

In the first iteration, the load accesses address X0 (which contains the address of some variable x). If the exclusive store fails, W0 gets value 1, meaning that the lowest 32 bits of X0 were updated. Now in the second iteration the address being accessed by the load was changed (unless the lowest 32 bits were already 1, but this does not sound like a reasonable assumption).

hernanponcedeleon commented 2 months ago

FYI, herd7 complains about this test with Warning: File "arm.litmus": Non-symbolic memory access found on '[1]' (User error) but it works if I replace the register used for the status (W0) to something else like W8.

@db7 is this code generated by a compiler or using inline assembly?

ThomasHaas commented 2 months ago

It does not only update the lowest bit but writes the whole value. So it should be wrong almost 100% of the time.

db7 commented 2 months ago

Indeed, it sounds fishy.

So I looked back in our chat history. The code I had looked like this:

AArch64 test.litc
{
  0:x0 = y;
  0:x1 = x;
  0:x2 = r0_P0;
  1:x0 = y;
  1:x1 = x;
  1:x2 = r0_P1;
}

P0                         | P1                     ;
                           |                        ;
      MOV     w3, #2       |       MOV     w3, #1   ;
                           |                        ;
LC78:                      |       LDR     w0, [x0] ;
      LDAXR   w5, [x1]     |                        ;
                           |       STLR    w3, [x1] ;
      STLXR   w4, w3, [x1] |                        ;
                           |       STR     w0, [x2] ;
      CBNZ    w4, LC78     |                        ;
                           |                        ;
      MOV     w3, #1       |                        ;
                           |                        ;
      STR     w3, [x0]     |                        ;
                           |                        ;
      STR     w5, [x2]     |                        ;

exists (x = 2 /\ (r0_P0 = 1 /\ r0_P1 = 1))

I think that something got lost in translation on the way.

db7 commented 2 months ago

Please confirm this is the problem we are discussing here @natgavrilenko

ThomasHaas commented 2 months ago

Definitely looks like a translation error. Nevertheless, the spinloop issue remains though I have a fix ready. I just was confused when the fix didn't work and then I looked into the code, finding this odd usage of registers.

db7 commented 2 months ago

Ok, so you confirm the issue, just the reported example was itself buggy :-D

Please let me know if you have a branch with the code pushed. I can try it out in several tests on my side.

ThomasHaas commented 2 months ago

Ok, there is another issue. If I add proper detection, then such loops are detected as liveness violations because the exclusive store can fail spuriously forever. I guess one needs to argue about fairness again and assume that the store will eventually succeed.

db7 commented 2 months ago

How do you handle progress of atomic_exchange at the C level?

ThomasHaas commented 2 months ago

We don't implement those with a store that can fail and thus do not even require a loop. I'm not sure if we correctly handle weak CAS though (we might always upgrade it to a strong one?).

hernanponcedeleon commented 2 months ago

How do you handle progress of atomic_exchange at the C level?

At the C level, the exchange cannot fail. The issue is that in arm, "An implementation might clear an exclusive monitor between the LDREX and the STREX, without any application-related cause.", making the store to "spuriously" fail.

db7 commented 2 months ago

Yes, I understand that. What should I do with the loop then? Can I configure dat3m with "I understand the risks, still I assume the stxr will eventually succeed"?

ThomasHaas commented 2 months ago

For now you can't. We could add an option to make all exclusive stores succeed (dangerous if used outside loops) or add the fairness assumption for the purpose of liveness detection.

EDIT: The fairness assumption is actually valid: the specification says that forward progress should be guaranteed.

db7 commented 2 months ago

Ok, so fairness assumption is default. I don't have to configure anything?

ThomasHaas commented 2 months ago

Once we implement it, it would be enabled by default (and likely without possibility of disabling it). So all you can do is wait until we implement it :). I'm technically still on vacation but maybe I can take a quick stab at it in the coming days.

hernanponcedeleon commented 2 months ago

EDIT: The fairness assumption is actually valid: the specification says that forward progress should be guaranteed.

It should be enough to make the spinloop instrumentation pass to add an assume(status_reg), right?

ThomasHaas commented 2 months ago

Yes, but that is kinda cheating. If we do an instrumentation like this, I would rather have a dedicated pass. Either way, the pass would likely be unsound in general or be specialized to only work on specific shapes of loops (i.e., be missing more exotic usages of exclusive stores).

ThomasHaas commented 2 months ago

Ok, so fairness assumption is default. I don't have to configure anything?

I opened a new PR with the necessary changes (#722). You can try if it solves your issues.

hernanponcedeleon commented 2 months ago

@db7 can you check if with these changes we now return PASS in all the tests from kittens?

db7 commented 2 months ago

The file I sent before to you was working with Datargnan before and the file was using lower case registers. I mean this one:

AArch64 test.litc
{
  0:x0 = y;
  0:x1 = x;
  0:x2 = r0_P0;
  1:x0 = y;
  1:x1 = x;
  1:x2 = r0_P1;
}

P0                         | P1                     ;
                           |                        ;
      MOV     w3, #2       |       MOV     w3, #1   ;
                           |                        ;
LC78:                      |       LDR     w0, [x0] ;
      LDAXR   w5, [x1]     |                        ;
                           |       STLR    w3, [x1] ;
      STLXR   w4, w3, [x1] |                        ;
                           |       STR     w0, [x2] ;
      CBNZ    w4, LC78     |                        ;
                           |                        ;
      MOV     w3, #1       |                        ;
                           |                        ;
      STR     w3, [x0]     |                        ;
                           |                        ;
      STR     w5, [x2]     |                        ;

exists (x = 2 /\ (r0_P0 = 1 /\ r0_P1 = 1))

Now, dartagnan does not parse it. And if I upcase the register names like this:

AArch64 test.litc
{
  0:x0 = y;
  0:x1 = x;
  0:x2 = r0_P0;
  1:x0 = y;
  1:x1 = x;
  1:x2 = r0_P1;
}

P0                         | P1                     ;
                           |                        ;
      MOV     w3, #2       |       MOV     w3, #1   ;
                           |                        ;
LC78:                      |       LDR     w0, [x0] ;
      LDAXR   w5, [x1]     |                        ;
                           |       STLR    w3, [x1] ;
      STLXR   w4, w3, [x1] |                        ;
                           |       STR     w0, [x2] ;
      CBNZ    w4, LC78     |                        ;
                           |                        ;
      MOV     w3, #1       |                        ;
                           |                        ;
      STR     w3, [x0]     |                        ;
                           |                        ;
      STR     w5, [x2]     |                        ;

exists (x = 2 /\ (r0_P0 = 1 /\ r0_P1 = 1))

Then I get the new conditional error.

[30.08.2024] 13:33:51 [INFO] Wmm.configureAll - encoding.wmm.reduceAcyclicityEncodeSets: true
[30.08.2024] 13:33:51 [INFO] Compilation.run - Program compiled to ARM8
[30.08.2024] 13:33:51 [ERROR] Dartagnan.main - Loop back jump if(bv64 X4 != bv64(0)); then goto LC78.loop is conditional.
com.dat3m.dartagnan.exception.MalformedProgramException: Loop back jump if(bv64 X4 != bv64(0)); then goto LC78.loop is conditional.
        at com.dat3m.dartagnan.program.processing.LoopFormVerification.checkAndCountLoops(LoopFormVerification.java:72) ~[dartagnan.jar:?]
        at com.dat3m.dartagnan.program.processing.LoopFormVerification.lambda$run$0(LoopFormVerification.java:44) ~[dartagnan.jar:?]
        at java.base/java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:214) ~[?:?]
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
        at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) ~[?:?]
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
        at java.base/java.util.stream.IntPipeline.reduce(IntPipeline.java:515) ~[?:?]
        at java.base/java.util.stream.IntPipeline.sum(IntPipeline.java:473) ~[?:?]
        at com.dat3m.dartagnan.program.processing.LoopFormVerification.run(LoopFormVerification.java:44) ~[dartagnan.jar:?]
        at com.dat3m.dartagnan.program.processing.ProcessingManager.lambda$run$0(ProcessingManager.java:161) ~[dartagnan.jar:?]
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
        at com.dat3m.dartagnan.program.processing.ProcessingManager.run(ProcessingManager.java:161) ~[dartagnan.jar:?]
        at com.dat3m.dartagnan.verification.solving.ModelChecker.preprocessProgram(ModelChecker.java:66) ~[dartagnan.jar:?]
        at com.dat3m.dartagnan.verification.solving.RefinementSolver.runInternal(RefinementSolver.java:189) ~[dartagnan.jar:?]
        at com.dat3m.dartagnan.verification.solving.RefinementSolver.run(RefinementSolver.java:170) ~[dartagnan.jar:?]
        at com.dat3m.dartagnan.Dartagnan.main(Dartagnan.java:178) [dartagnan.jar:?]

ThomasHaas commented 2 months ago

This error is not new to this PR. It happens because we have not enabled our loop normalization pass on Litmus code (we are somewhat more careful on transforming litmus code). I will add a PR that enables this pass.

hernanponcedeleon commented 2 months ago

This is already enabled I the arm-asm branch. @db7 can you apply the changes of this PR on top of that branch and confirm it works as expected?

EDIT: @db7 I think before you were always working with this branch which has changes that you definitely need (like supporting zero registers).

ThomasHaas commented 2 months ago

I guess we should revise/improve our handling of litmus code, seeing that it sees more usage nowadays.

db7 commented 2 months ago

@hernanponcedeleon, thank you! I was using arm-asm for a long time and couldnt even remember. There was one conflict while merging both branches (arm-asm and Thomas').

Now the test above (with lower- or upper-case register names) run and gives me this result:

 ======== Summary ========
Number of iterations: 3
Total native solving time: 0.3 secs
   -- Bound check time: 0.2 secs
Total CAAT solving time: 0.24 secs
   -- Model extraction time: 0.6 secs
   -- Population time: 0.9 secs
   -- Consistency check time: 0.0 secs
   -- Reason computation time: 0.5 secs
   -- Refining time: 0.0 secs
   -- #Computed core reasons: 1
   -- #Computed core reduced reasons: 1
   -- Min model size (#events): 39
   -- Average model size (#events): 39
   -- Max model size (#events): 39

[30.08.2024] 13:50:00 [INFO] RefinementSolver.runInternal - Verification finished with result FAIL
Condition exists (x == bv64(2)) && (r0_P0 == bv64(1)) && (r0_P1 == bv64(1))
No
Total verification time: 0.348 secs
diogo@dbdell:~/Workspaces/kittens$

And that is the expected output. So it looks good!

ThomasHaas commented 1 month ago

Can this be closed?

hernanponcedeleon commented 1 month ago

Fixed by 6326445a3923987549c4299678a6fc5cc3d4c9a8

hernanponcedeleon / Dat3M

Implement spinloop detection for ARM exclusive pairs #719