Closed mikevin920 closed 6 months ago
@caixunshiren
@TT-BrianLiu , is this an LLK thing with the fp32 accum?
@cglagovich , are you working on this? (your name is assigned), if not, please unassign so we can track more proper owner
If accumulating with fp32 or in L1, max subblock size is 4. Issues like this really shouldn't be filed. The error message is not so much a bug as it is describing the feature. It should be at best a feature request.
@TT-BrianLiu we did not specify the subblock_size for this fallback matmul,the automatically inferred one is wrong? Should this be a bug?
Ooh i see. I think we didn't have use cases where we didn't manually specify the program config when we specify the compute config. The logic to automatically infer mm params assume we are using default compute config.
It should be fairly straightforward to uplift to support this, but I would recommend you to just pass in your own program configs if you want to configure other things. I am hesitant to keep patching our more automatic matmuls because we should review if there will be other things that could affect program configs.
Describe the bug With in0 and in1 both interleaved, using tt_lib.operation.primary.matmul with fp32_dest_acc enabled, gives this error
Total number of tiles in a subblock must be less than 4 when in fp32_dest_acc mode"
To Reproduce Steps to reproduce the behavior:ttnn.Shape([1, 32, 128, 8192]) and ttnn.Shape([1, 1, 8192, 4096])
Config: "COMPUTE_KERNEL_CONFIG": ttl.tensor.WormholeComputeKernelConfig(
math_fidelity=ttl.tensor.MathFidelity.LoFi,
def prefill_forward(self, x: tt_lib.tensor.Tensor) -> tt_lib.tensor.Tensor: hidden_states = [] w1_outs = [] w3_outs = [] for i in range(len(x)): """ (Pdb) x[i].memory_config() tt::tt_metal::MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED,buffer_type=BufferType::DRAM,shard_spec=std::nullopt) (Pdb) self.w1_list[i].memory_config() tt::tt_metal::MemoryConfig(memory_layout=TensorMemoryLayout::INTERLEAVED,buffer_type=BufferType::DRAM,shard_spec=std::nullopt) (Pdb) x[i].shape ttnn.Shape([1, 32, 128, 8192]) (Pdb) self.w1_list[i].shape ttnn.Shape([1, 1, 8192, 4096]) """ w1_outs.append(