tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
302 stars 24 forks source link

More concise TTNN APIs #6960

Open davorchap opened 3 months ago

davorchap commented 3 months ago

API current:

input_tensor_a = ttnn.from_torch(
    torch_input_tensor_a, layout=ttnn.TILE_LAYOUT, device=device, memory_config=ttnn.L1_MEMORY_CONFIG
)

input_tensor_b = ttnn.from_torch(
    torch_input_tensor_b, layout=ttnn.TILE_LAYOUT, device=device, memory_config=ttnn.L1_MEMORY_CONFIG
)
output_tensor = ttnn.add(input_tensor_a, input_tensor_b, memory_config=ttnn.L1_MEMORY_CONFIG)
ttnn.deallocate(input_tensor_b)
ttnn.to_torch(output_tensor)

API proposal:

TILE_LAYOUT --> TILE
MEMORY_CONFIG_L1 --> L1
memory_config --> mem_config (or just "memory" ?)
# and similar for all other APIs
input_tensor_a = ttnn.from_torch(torch_input_tensor_a, layout=ttnn.TILE, device=device, mem_config=ttnn.L1)
input_tensor_b = ttnn.from_torch(torch_input_tensor_b, layout=ttnn.TILE, device=device, mem_config=ttnn.L1)
output_tensor = ttnn.add(input_tensor_a, input_tensor_b, mem_config=ttnn.L1)
ttnn.deallocate(input_tensor_b)
ttnn.to_torch(output_tensor)

# or
input_tensor_a = ttnn.from_torch(torch_input_tensor_a, layout=ttnn.TILE, device=device, memory=ttnn.L1)
input_tensor_b = ttnn.from_torch(torch_input_tensor_b, layout=ttnn.TILE, device=device, memory=ttnn.L1)
output_tensor = ttnn.add(input_tensor_a, input_tensor_b, memory=ttnn.L1)
ttnn.deallocate(input_tensor_b)
ttnn.to_torch(output_tensor)

written in Andrej Karpathy abbreviated coding style, proposal + AK coding style

a = ttnn.from_torch(torch_a, layout=ttnn.TILE, device=device, memory=ttnn.L1)
b = ttnn.from_torch(torch_a, layout=ttnn.TILE, device=device, memory=ttnn.L1)
out = ttnn.add(a, b, memory=ttnn.L1)
ttnn.deallocate(b)
ttnn.to_torch(out)
davorchap commented 3 months ago

@cglagovich @apalaguha @yieldthought @nsmithtt @AleksKnezevic your input on other example where APIs can be more concise to make models more readable (character / word count can be intimidating)

AleksKnezevic commented 3 months ago

Two ideas, first, default memory config for most ops should be input memory config, not DRAM. Since changing the default would probably break a bunch of things, you could expose something like ttnn.set_default_memory_config() and give the user control. Second, it would be nice to have torch style tensor functions, and be able to call:

b = a.reshape(new_shape)
a.deallocate()

instead of:

b = ttnn.reshape(a, new_shape)
ttnn.deallocate(a)
davorchap commented 3 months ago

Two ideas, first, default memory config for most ops should be input memory config, not DRAM. Since changing the default would probably break a bunch of things, you could expose something like ttnn.set_default_memory_config() and give the user control. Second, it would be nice to have torch style tensor functions, and be able to call:


b = a.reshape(new_shape)

a.deallocate()

instead of:


b = ttnn.reshape(a, new_shape)

ttnn.deallocate(a)

Yes!!

davorchap commented 3 months ago

Two ideas, first, default memory config for most ops should be input memory config, not DRAM. Since changing the default would probably break a bunch of things, you could expose something like ttnn.set_default_memory_config() and give the user control. Second, it would be nice to have torch style tensor functions, and be able to call:

b = a.reshape(new_shape)

a.deallocate()

instead of:

b = ttnn.reshape(a, new_shape)

ttnn.deallocate(a)

Yes!!

based on suggestions by @AleksKnezevic the code would look like 🔥

a = ttnn.from_torch(torch_a, layout=ttnn.TILE, device=device, memory=ttnn.L1)
b = ttnn.from_torch(torch_a, layout=ttnn.TILE, device=device, memory=ttnn.L1)

out = ttnn.add(a, b) # default memory is the input memory 
b.deallocate()
out_torch = out.to_torch()
arakhmati commented 3 months ago

I like the idea of shortening the names

And I like the idea of simplifying how memory configs are set.

But I do think @yieldthought's approach of using context managers is better than getting memory config from input tensors:

Basically, it would look something like this:

  with ttnn.memory(ttnn.L1):
      input_tensor_a = ttnn.from_torch(torch_input_tensor_a, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)
      input_tensor_b = ttnn.from_torch(torch_input_tensor_b, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT, device=device)
      output_tensor = input_tensor_a + input_tensor_b

      with ttnn.memory(ttnn.DRAM):
          output_tensor = ttnn.exp(output_tensor)

The reason why this is better is because the user still has full control of what memory config the operation will use. But without the annoyance of having to pass it in everywhere.

And also, this is better because it's not ambiguous. With the other approach I have a lot of questions right away. What does default memory config from the input mean? Is it memory config of input tensor 0? Or is it L1 memory config as long as one of the tensors is in L1? And what does it mean for sharded operations? I would rather not introduce this heuristic

nsmithtt commented 3 months ago

@arakhmati, I like the context manager and I think that'd be a nice API to have, but I also think inferring the memory config from the input tensors is really handy. It also doesn't solve sharded situations.

What does default memory config from the input mean? Is it memory config of input tensor 0? Or is it L1 memory config as long as one of the tensors is in L1? And what does it mean for sharded operations? I would rather not introduce this heuristic

I agree, having a set of rules could get complicated and it could end up doing a lot behind the users back, but what about assertions? Assert all inputs have same memory config and layout.

a = ttnn.from_torch(torch_a, layout=ttnn.TILE, device=device, memory=ttnn.L1)
b = ttnn.from_torch(torch_a, layout=ttnn.ROW_MAJOR, device=device, memory=ttnn.DRAM)
c = ttnn.add(a, b) # FAILS

Where memory and layout are union types that can also just be a tensor where ttnn knows how to pull the relevant data from the tensor.

arakhmati commented 3 months ago

Btw, this shouldn't be a ttnn feature if implemented this way. Because of the upcoming async mode and the need to synchronize main thread to check the input memory configs. We already need to move some logic to C++ because of that. So, if we want this, in ttnn we would just need to set the defaults to None. And then the C++ ops would need to be updated to infer memory configs from input tensors

Context manager doesn't have this problem because it doesn't need to look at the attributes of the input tensors

nsmithtt commented 3 months ago

@arakhmati, maybe I misunderstand what async mode means, but aren't the memory configs + tensor layouts tracked on the tensor host side? The state should be known by ttnn regardless of when the work is eventually dispatched.

arakhmati commented 3 months ago

All of the attributes will be set asynchronously. Including shape, layout and dtype