tenstorrent / tt-umd

User-Mode Driver for Tenstorrent hardware
Apache License 2.0
9 stars 5 forks source link

Read_from_device issue when using REG_TLB #208

Closed lgojic-tt closed 2 weeks ago

lgojic-tt commented 2 weeks ago

On wormhole (haven't tested on grayskull), when using read_from_device and providing fallback_tlb = REG_TLB, there is an issue when you try to read more than 16MB.

In this example I have written the value of an address to every address.

description

After 16MB the memory simply wraps itself to the beginning. The same also happens if we try to read something close to the border of the 16MB address. Even reading from address 0x00ffff00 produces the same issue.

image

But if we read from the address 0x01000000, the correct values are returned:

image

If we try to read more than 32mb, after the address 0x02000000, the first 32 bytes are repeated until the address 0x03000000.

image

If we try to read more than 48mb, the whole machine crashes. The issue is fixed iffallback_tlb = LARGE_READ_TLB is used.

This issue is reproduced by

pjanevskiTT commented 2 weeks ago

As per offline discussion, and as @lgojic-tt realized, using LARGE_READ_TLB resolves the issue. REG_TLB was causing problems since there was probably some WC/UC clash, since REG_TLB should be just for register read/writes. It would be nice to have a way of protecting ourselves of these problems, but since the problem with strings for fallback TLBs is going to go away with API redesign, this is not a problem. Like I said, closing the issue since the problem has been resolved...