Closed natgavrilenko closed 1 month ago
Your code actually shows a problem in the parser for AARCH64. The registers W0 is parsed as X0, making the loop truly side-effectful as the address of the exclusive load (X0) gets (wrongly) updated by the status of the exclusive store (W0). That being said, we would still fail to detect the side-effectfree iteration where the store fails (but fixing this is easy, though insufficient due to the parser bug).
Register W0 is the lowest 32 bits of register X0 (which is 64 bits). The fact that the exclusive store partially updates the address of the load looks to me like a problem in the test per se.
I think @db7 generated this program from some real assembly. Diogo can you confirm if the assembly really uses W0/X0 for both the addresses and the execution status of the exclusive store, or this was a problem when translating from assembly to pasm with kittens?
Oh, I'm not familiar with the register names in assembly code, but that certainly sounds wrong.
@hernanponcedeleon Are you talking about this? This is perfectly fine and yes, that is the output of the compiler.
STLXR W0, X2, [X0]
According to the documentation of the STLXR instruction, the first argument is the status, not the loaded value. Address is in X0, it loads the value into X2 and W0 just keeps the success result.
Yes, but W0 is the lower part of X0, so the address gets updated if W0 is changed. This is surely wrong.
LC00:
LDAXR X3, [X0]
STLXR W0, X2, [X0]
CBZ W0, LC01
B LC00
In the first iteration, the load accesses address X0
(which contains the address of some variable x
). If the exclusive store fails, W0
gets value 1, meaning that the lowest 32 bits of X0
were updated. Now in the second iteration the address being accessed by the load was changed (unless the lowest 32 bits were already 1, but this does not sound like a reasonable assumption).
FYI, herd7 complains about this test with Warning: File "arm.litmus": Non-symbolic memory access found on '[1]' (User error)
but it works if I replace the register used for the status (W0
) to something else like W8
.
@db7 is this code generated by a compiler or using inline assembly?
It does not only update the lowest bit but writes the whole value. So it should be wrong almost 100% of the time.
Indeed, it sounds fishy.
So I looked back in our chat history. The code I had looked like this:
AArch64 test.litc
{
0:x0 = y;
0:x1 = x;
0:x2 = r0_P0;
1:x0 = y;
1:x1 = x;
1:x2 = r0_P1;
}
P0 | P1 ;
| ;
MOV w3, #2 | MOV w3, #1 ;
| ;
LC78: | LDR w0, [x0] ;
LDAXR w5, [x1] | ;
| STLR w3, [x1] ;
STLXR w4, w3, [x1] | ;
| STR w0, [x2] ;
CBNZ w4, LC78 | ;
| ;
MOV w3, #1 | ;
| ;
STR w3, [x0] | ;
| ;
STR w5, [x2] | ;
exists (x = 2 /\ (r0_P0 = 1 /\ r0_P1 = 1))
I think that something got lost in translation on the way.
Please confirm this is the problem we are discussing here @natgavrilenko
Definitely looks like a translation error. Nevertheless, the spinloop issue remains though I have a fix ready. I just was confused when the fix didn't work and then I looked into the code, finding this odd usage of registers.
Ok, so you confirm the issue, just the reported example was itself buggy :-D
Please let me know if you have a branch with the code pushed. I can try it out in several tests on my side.
Ok, there is another issue. If I add proper detection, then such loops are detected as liveness violations because the exclusive store can fail spuriously forever. I guess one needs to argue about fairness again and assume that the store will eventually succeed.
How do you handle progress of atomic_exchange
at the C level?
We don't implement those with a store that can fail and thus do not even require a loop. I'm not sure if we correctly handle weak CAS though (we might always upgrade it to a strong one?).
How do you handle progress of
atomic_exchange
at the C level?
At the C level, the exchange cannot fail. The issue is that in arm, "An implementation might clear an exclusive monitor between the LDREX and the STREX, without any application-related cause.", making the store to "spuriously" fail.
Yes, I understand that. What should I do with the loop then? Can I configure dat3m with "I understand the risks, still I assume the stxr will eventually succeed"?
For now you can't. We could add an option to make all exclusive stores succeed (dangerous if used outside loops) or add the fairness assumption for the purpose of liveness detection.
EDIT: The fairness assumption is actually valid: the specification says that forward progress should be guaranteed.
Ok, so fairness assumption is default. I don't have to configure anything?
Once we implement it, it would be enabled by default (and likely without possibility of disabling it). So all you can do is wait until we implement it :). I'm technically still on vacation but maybe I can take a quick stab at it in the coming days.
EDIT: The fairness assumption is actually valid: the specification says that forward progress should be guaranteed.
It should be enough to make the spinloop instrumentation pass to add an assume(status_reg)
, right?
Yes, but that is kinda cheating. If we do an instrumentation like this, I would rather have a dedicated pass. Either way, the pass would likely be unsound in general or be specialized to only work on specific shapes of loops (i.e., be missing more exotic usages of exclusive stores).
Ok, so fairness assumption is default. I don't have to configure anything?
I opened a new PR with the necessary changes (#722). You can try if it solves your issues.
@db7 can you check if with these changes we now return PASS in all the tests from kittens?
The file I sent before to you was working with Datargnan before and the file was using lower case registers. I mean this one:
AArch64 test.litc
{
0:x0 = y;
0:x1 = x;
0:x2 = r0_P0;
1:x0 = y;
1:x1 = x;
1:x2 = r0_P1;
}
P0 | P1 ;
| ;
MOV w3, #2 | MOV w3, #1 ;
| ;
LC78: | LDR w0, [x0] ;
LDAXR w5, [x1] | ;
| STLR w3, [x1] ;
STLXR w4, w3, [x1] | ;
| STR w0, [x2] ;
CBNZ w4, LC78 | ;
| ;
MOV w3, #1 | ;
| ;
STR w3, [x0] | ;
| ;
STR w5, [x2] | ;
exists (x = 2 /\ (r0_P0 = 1 /\ r0_P1 = 1))
Now, dartagnan does not parse it. And if I upcase the register names like this:
AArch64 test.litc
{
0:x0 = y;
0:x1 = x;
0:x2 = r0_P0;
1:x0 = y;
1:x1 = x;
1:x2 = r0_P1;
}
P0 | P1 ;
| ;
MOV w3, #2 | MOV w3, #1 ;
| ;
LC78: | LDR w0, [x0] ;
LDAXR w5, [x1] | ;
| STLR w3, [x1] ;
STLXR w4, w3, [x1] | ;
| STR w0, [x2] ;
CBNZ w4, LC78 | ;
| ;
MOV w3, #1 | ;
| ;
STR w3, [x0] | ;
| ;
STR w5, [x2] | ;
exists (x = 2 /\ (r0_P0 = 1 /\ r0_P1 = 1))
Then I get the new conditional error.
[30.08.2024] 13:33:51 [INFO] Wmm.configureAll - encoding.wmm.reduceAcyclicityEncodeSets: true
[30.08.2024] 13:33:51 [INFO] Compilation.run - Program compiled to ARM8
[30.08.2024] 13:33:51 [ERROR] Dartagnan.main - Loop back jump if(bv64 X4 != bv64(0)); then goto LC78.loop is conditional.
com.dat3m.dartagnan.exception.MalformedProgramException: Loop back jump if(bv64 X4 != bv64(0)); then goto LC78.loop is conditional.
at com.dat3m.dartagnan.program.processing.LoopFormVerification.checkAndCountLoops(LoopFormVerification.java:72) ~[dartagnan.jar:?]
at com.dat3m.dartagnan.program.processing.LoopFormVerification.lambda$run$0(LoopFormVerification.java:44) ~[dartagnan.jar:?]
at java.base/java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:214) ~[?:?]
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
at java.base/java.util.stream.IntPipeline.reduce(IntPipeline.java:515) ~[?:?]
at java.base/java.util.stream.IntPipeline.sum(IntPipeline.java:473) ~[?:?]
at com.dat3m.dartagnan.program.processing.LoopFormVerification.run(LoopFormVerification.java:44) ~[dartagnan.jar:?]
at com.dat3m.dartagnan.program.processing.ProcessingManager.lambda$run$0(ProcessingManager.java:161) ~[dartagnan.jar:?]
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
at com.dat3m.dartagnan.program.processing.ProcessingManager.run(ProcessingManager.java:161) ~[dartagnan.jar:?]
at com.dat3m.dartagnan.verification.solving.ModelChecker.preprocessProgram(ModelChecker.java:66) ~[dartagnan.jar:?]
at com.dat3m.dartagnan.verification.solving.RefinementSolver.runInternal(RefinementSolver.java:189) ~[dartagnan.jar:?]
at com.dat3m.dartagnan.verification.solving.RefinementSolver.run(RefinementSolver.java:170) ~[dartagnan.jar:?]
at com.dat3m.dartagnan.Dartagnan.main(Dartagnan.java:178) [dartagnan.jar:?]
This error is not new to this PR. It happens because we have not enabled our loop normalization pass on Litmus code (we are somewhat more careful on transforming litmus code). I will add a PR that enables this pass.
This is already enabled I the arm-asm
branch. @db7 can you apply the changes of this PR on top of that branch and confirm it works as expected?
EDIT: @db7 I think before you were always working with this branch which has changes that you definitely need (like supporting zero registers).
I guess we should revise/improve our handling of litmus code, seeing that it sees more usage nowadays.
@hernanponcedeleon, thank you! I was using arm-asm
for a long time and couldnt even remember. There was one conflict while merging both branches (arm-asm and Thomas').
Now the test above (with lower- or upper-case register names) run and gives me this result:
======== Summary ========
Number of iterations: 3
Total native solving time: 0.3 secs
-- Bound check time: 0.2 secs
Total CAAT solving time: 0.24 secs
-- Model extraction time: 0.6 secs
-- Population time: 0.9 secs
-- Consistency check time: 0.0 secs
-- Reason computation time: 0.5 secs
-- Refining time: 0.0 secs
-- #Computed core reasons: 1
-- #Computed core reduced reasons: 1
-- Min model size (#events): 39
-- Average model size (#events): 39
-- Max model size (#events): 39
[30.08.2024] 13:50:00 [INFO] RefinementSolver.runInternal - Verification finished with result FAIL
Condition exists (x == bv64(2)) && (r0_P0 == bv64(1)) && (r0_P1 == bv64(1))
No
Total verification time: 0.348 secs
diogo@dbdell:~/Workspaces/kittens$
And that is the expected output. So it looks good!
Can this be closed?
Fixed by 6326445a3923987549c4299678a6fc5cc3d4c9a8
Consider the attached 'spinloop.litmus' test. The loop in thread P0 has no side effects, because this loop terminates whenever STLXR is executed. The value of W0 is 0 if the store was successful and 1 if the store failed. However, the test terminates with UNKNOWN result for any unrolling bound. The manually unrolled test 'spinloop-unrolled.litmus' works as expected.
The current implementation of spinloop detection fails to detect this spinloop. The reason is that dartagnan internally represent STLXR as a pair of instructions in the same branch, while spinloop detection essentially analyses the branches withing a loop.
Note: I would avoid changing the internal representation of ARM exclusive pair to contain branches. Representing such event pairs using branches would generate artificial control dependencies, which are used in the ARM consistency model. I would rather implement a special handling of RmwStoreExclusive events in the spinloop detection method.
spinloop.txt spinloop-unrolled.txt