tenstorrent / tt-umd

User-Mode Driver for Tenstorrent hardware
Apache License 2.0
6 stars 3 forks source link

Remove hardcoded 0 for buda dp #9

Closed jnie-TT closed 3 months ago

jnie-TT commented 3 months ago

Remove hardcoded 0 when translating to noc table coords

joelsmithTT commented 3 months ago

Looks reasonable in that it is consistent with other parts of the code.

My only comment is that get_target_mmio_device_ids does not look very thread safe.

xanderchin commented 3 months ago

@aliuTT Also looked through your changes that removed hardcoded 0's. Should we be using the mmio chip ids for the noc translation table lookup (this PR and @aliuTT) or should we be using the chip ids of the destination chip-x-y?

aliuTT commented 3 months ago

hmm I think either works, looks like the lookup has some redundant info:

for(const chip_id_t& chip : target_devices_in_cluster) {
        // Initialize identity mapping for Non-MMIO chips as well
        if(!ndesc -> is_chip_mmio_capable(chip)) {
            harvested_coord_translation.insert({chip, create_harvested_coord_translation(arch_name, true)});
        }
    }
jnie-TT commented 3 months ago

Hey @aliuTT I think that gets overwritten in the constructor if translation tables are enabled:

  if(translation_tables_en) {
      harvested_coord_translation.clear();
      for(const chip_id_t& chip : target_devices_in_cluster) {
          harvested_coord_translation.insert({chip, create_harvested_coord_translation(arch_name, false)});
      }
  }

Would it be ok then to update all calls to translate_to_noc_table_coords to use the actual chip id instead of the first mmio-mapped chip? My concern here is galaxy which has heterogeneous harvesting, don't know if that will get affected.

aliuTT commented 3 months ago

I'm not too familiar with how the translation table is getting used in BUDA. Not sure if Galaxy has any harvesting at all 🤔

jnie-TT commented 3 months ago

I can test it on buda, don't think it would break anything. Are we using these APIs on the metal side, and would it affect anything there?

aliuTT commented 3 months ago

Metal doesn't use translation tables. We also don't use write_to_non_mmio_device_send_epoch_cmd or rolled_write_to_non_mmio_device