mc-imperial / spirv-control-flow

Formal modelling of SPIR-V control flow using Alloy
Apache License 2.0
2 stars 0 forks source link

MESA: SPIR-V offset 0: SPIR-V parsing FAILED: Invalid back or cross-edge in the CFG #7

Open Jack-Clark opened 2 years ago

Jack-Clark commented 2 years ago

Running the following command using branch 22.0.1 and master (commit cde1be0b) of Mesa:

amber -t spv1.3 -t 1.1 reduced-cross-edge-bug.amber

gives:

[ERROR] validation layer (../src/vulkan/runtime/vk_nir.c:59):
SPIR-V offset 0: SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/vtn_cfg.c:662
    Invalid back or cross-edge in the CFG
    0 bytes into the SPIR-V binary
SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/vtn_cfg.c:662
    Invalid back or cross-edge in the CFG
    0 bytes into the SPIR-V binary
[ERROR] validation layer (../src/vulkan/runtime/vk_shader_module.c:128):
spirv_to_nir failed (VK_ERROR_UNKNOWN)
[ERROR] validation layer (../src/intel/vulkan/anv_pipeline.c:1987):
VK_ERROR_UNKNOWN
reduced-cross-edge-bug.amber: Vulkan::Calling vkCreateComputePipelines Fail

Summary of Failures: 
reduced-cross-edge-bug.amber

Summary: 0 pass, 1 fail
[ERROR] validation layer (Validation):
Validation Error: [ VUID-vkDestroyDevice-device-00378 ] Object 0: handle = 0x5613ab15de10, type = VK_OBJECT_TYPE_DEVICE; Object 1: handle = 0xe7f79a0000000005, type = VK_OBJECT_TYPE_PIPELINE_LAYOUT; | MessageID = 0x71500fba | OBJ ERROR : For VkDevice 0x5613ab15de10[], VkPipelineLayout 0xe7f79a0000000005[] has not been destroyed. The Vulkan spec states: All child objects created on device must have been destroyed prior to destroying device (https://vulkan.lunarg.com/doc/view/1.3.204.1/linux/1.3-extensions/vkspec.html#VUID-vkDestroyDevice-device-00378)

This is potentially related to this bug. A patch was eventually merged for that bug so this is still valid. I think the discussion of the bug highlights the value in the formal modelling of the control flow rules.

I've compiled and validated the reduced asm from the following amber code:

#!amber

SHADER compute compute_shader SPIRV-ASM

; Follow the path:
; 8 -> <9> -> <12> -> 10
;
; 2 CFG nodes have OpBranchConditional or OpSwitch as their terminators (denoted <n>): 9 and 12.
;
; To follow this path, we need to make these decisions each time we reach 9 or 12.
; This path was generated with the seed 2641702989343433340 and has length 4.
;
; We equip the shader with 2+1 storage buffers:
; - An input storage buffer with the directions for each node 9 or 12
; - An output storage buffer that records the blocks that are executed

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 8
; Bound: 15
; Schema: 0

               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %7 "main"
               OpExecutionMode %7 LocalSize 1 1 1

          %1 = OpTypeVoid
          %2 = OpTypeFunction %1
          %3 = OpTypeBool
       %true = OpConstantTrue %3

          %7 = OpFunction %1 None %2

          %8 = OpLabel ; validCFG/StructurallyReachableBlock$6
               OpBranch %9

          %9 = OpLabel ; validCFG/LoopHeader$0
               OpLoopMerge %10 %11 None
               OpBranchConditional %true %12 %15

         %12 = OpLabel ; validCFG/StructurallyReachableBlock$4
               OpBranchConditional %true %15 %10

         %10 = OpLabel ; validCFG/StructurallyReachableBlock$3
               OpReturn

         %15 = OpLabel ; validCFG/StructurallyReachableBlock$1
               OpBranch %11

         %11 = OpLabel ; validCFG/StructurallyReachableBlock$0
               OpBranch %17

         %17 = OpLabel ; validCFG/StructurallyReachableBlock$5
               OpBranch %9

               OpFunctionEnd

 END

 PIPELINE compute pipeline
   ATTACH compute_shader

 END

 RUN pipeline 1 1 1

I've removed many blocks from the original CFG, so it would be good if someone (@afd @vili-1 @johnwickerson) could double check this before I submit it as a bug.

There is something in the original CFG file that caught my eye, but doesn't affect the bug, however could indicate an error in some part of our code. There is a block B0 which is labelled as wholly unreachable (i.e. uses Block$X naming), but looks structurally reachable to me because of the path SRB6->LH0->B0. Am I missing something? I've drawn out the CFG below. Note SRBX = StructurallyReachableBlock$X and BX = Block$X.

Cross-edge-bug-cfg

Here is the skeleton asm:

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 8
; Bound: 15
; Schema: 0

               OpCapability Shader
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %7 "main"
               OpExecutionMode %7 LocalSize 1 1 1

               ; Below, we declare various types and variables for storage buffers.
               ; These decorations tell SPIR-V that the types and variables relate to storage buffers

          %1 = OpTypeVoid
          %2 = OpTypeFunction %1
          %3 = OpTypeBool
          %4 = OpTypeInt 32 0
       %true = OpConstantTrue %3
          %6 = OpConstant %4 0

          %7 = OpFunction %1 None %2

          %8 = OpLabel ; validCFG/StructurallyReachableBlock$6
               OpBranch %9

          %9 = OpLabel ; validCFG/LoopHeader$0
               OpLoopMerge %10 %11 None
               OpBranchConditional %true %12 %13

         %12 = OpLabel ; validCFG/StructurallyReachableBlock$4
               OpBranchConditional %true %14 %10

         %13 = OpLabel ; validCFG/Block$0
               OpBranch %14

         %10 = OpLabel ; validCFG/StructurallyReachableBlock$3
               OpReturn

         %14 = OpLabel ; validCFG/SelectionHeader$0
               OpSelectionMerge %15 None
               OpSwitch %6 %15 1 %16

         %16 = OpLabel ; validCFG/StructurallyReachableBlock$2
               OpBranchConditional %true %11 %15

         %15 = OpLabel ; validCFG/StructurallyReachableBlock$1
               OpBranch %11

         %11 = OpLabel ; validCFG/StructurallyReachableBlock$0
               OpBranch %17

         %17 = OpLabel ; validCFG/StructurallyReachableBlock$5
               OpBranch %9

               OpFunctionEnd
afd commented 2 years ago

From a quick look, the CFG you have sketched looks valid. I don't have time to dig properly before going for holiday. @vili-1 it would be great if you could do a review of the Amber before @Jack-Clark files an issue to Mesa. And also the issue of a block that's structurally reachable but is named as if it were wholly unreachable would be worth getting to the bottom of. Thanks!

Jack-Clark commented 2 years ago

It seems this bug could be the same as the one filed here. I've added a comment with the reduced example as the original example provided is ~180 lines of spirv whereas the reduced example is about ~30 lines (w/o new lines).

vili-1 commented 2 years ago

@Jack-Clark, in our model we have the set StructurallyReachableBlock which is a sub-type of Block, where Block is the set of all blocks, totally-unreachable inclusive). So, using Block$X naming doesn't necessarily mean that it is a totally unreachable block. I guess you are referring to block 13 in the graph below as a "totally-reachable". As you see all blocks here are reachable, i.e., StructurallyReachableBlock == Block. What is that differentiates block 13 from others in your understanding?

The graph is valid, by the way.

Now, about the error you get running amber, I do get an error, too. I have to dig into this a bit.

[mvk-error] VK_ERROR_DEVICE_LOST: Command buffer 0x7fe1ddb04fe0 "" execution failed (code 12): GPU Command Buffer execution stopped due to Stack Overflow Exception. Please check the [MTLComputePipelineDescriptor maxCallStackDepth] setting. (00000011:kIOAccelCommandBufferCallbackErrorStackOverflow)
jack1.amber: Vulkan::Calling vkWaitForFences Fail

Summary of Failures:
  jack1.amber

Summary: 0 pass, 1 fail

jajck1

vili-1 commented 2 years ago

I'm attaching the xml so let's refer to the naming used in it for further discussion on this issue.

jajck1.xml.zip

Jack-Clark commented 2 years ago

@vili-1 I am familiar with the terms structurally reachable and wholly unreachable as defined in the structured control flow doc, but I'm not sure what totally reachable and totally unreachable mean - are they equivalent? Also bear in mind I'm not familiar with the Alloy model or Alloy itself.

My expectation was that all structurally reachable blocks would be of type StructurallyReachableBlock or one of its subtypes (if it has any) and that any other blocks that are not a subtype of StructurallyReachableBlock would be wholly unreachable. From your answer it sounds like this is not the case. This is what caused my confusion regarding block 13 as it had type Block rather than StructurallyReachableBlock, yet it was structurally reachable.

vili-1 commented 2 years ago

Ah, yes - you are right - "wholly (un)reachable" is the correct term.

In the xml that I attached today, all blocks are of type StructurallyReachableBlock. Maybe the confusion is because we changed the names in the model: what is now StructurallyReachableBlock/Block was before Block/BLOCK. If there's still confusion please don't hesitate to ask.

vili-1 commented 2 years ago

@Jack-Clark can you please send me the amber file that you are getting the error from. Just realised that the one you posted above is not a proper fleshed one (hence the Stack Overflow Exception I got).

vili-1 commented 2 years ago

I used my original fleshing program to flesh this example. Executing the amber is always successful in my Mac.

Jack-Clark commented 2 years ago

@vili-1 I can try to dig out the original amber file if you still want to take a look? I think what may have happened is that I had regenerated all of the XML files, but not removed the old amber files from the directory, so there will be a mix of old and new amber files. This won't affect correctness, but is a little confusing on the naming front. I'll be sure to only have amber files with the new naming in future folders.

vili-1 commented 2 years ago

No worries @Jack-Clark, I fleshed it using my original program, so no need to send it.

vili-1 commented 2 years ago

I get no error with SwiftShader either.

Jack-Clark commented 2 years ago

@vili-1 thanks for confirming. I get an error with MESA only. Both the SwiftShader and Nvidia drivers successfully execute this example when I tested.