tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
397 stars 49 forks source link

Mistral CPU Demo w/ multiple prompts #3265

Closed boris-drazic closed 10 months ago

boris-drazic commented 11 months ago

CPU demo errors out if more then one user input is passed in.

To replicate, on branch saichand/mistral_cpu_demo edit file models/experimental/mistral/demo/mistral_cpu_demo.py and add some more inputs to sample_context, e.g.,

    sample_context = [
        "This is a sample text for single layer execution ",
        "This is a sample text for single layer execution ",
        "This is a sample text for single layer execution ",
    ] 

then run the test with pytest models/experimental/mistral/demo/mistral_cpu_demo.py::test_demo_mistral_single_layer

This is the error:

>       self.cache_k[:bsz].scatter_(dim=1, index=scatter_pos, src=xk[:, -self.sliding_window :])
E       RuntimeError: Expected index [3, 11, 8, 128] to be smaller than self [1, 4096, 8, 128] apart from dimension 1 and to be smaller size than src [3, 11, 8, 128]

models/experimental/mistral/reference/model.py:126: RuntimeError
vigneshkeerthivasanx commented 11 months ago

Earlier, as the number of prompts is increased to 3, the max_batch_size should also be changed to 3 (As max_batch_size is fixed at 1). Now we have updated the PR (#3140). We have added max_batch_size as a fixture and the number of prompts are generated w.r.t number of batch_size passed as input in the fixtures.

tt-rkim commented 11 months ago

Is this ready to be closed? @boris-drazic

boris-drazic commented 11 months ago

once merged to main it will be

saichandax commented 10 months ago

Closing this ticket as #3140 PR has been merged to main. Thank you.