openpower-cores / a2i

Other
243 stars 40 forks source link

How to calculate the set number based on the EA (effective address) in the MMU module #13

Closed luffygood closed 3 years ago

luffygood commented 3 years ago

There are 512 entries in the TLB which are organized as 4-way and 128-set. Meanwhile, a2i core supports different page size such as 4KB, 64KB, 1MB and so on. I am curious about how MMU calculates the set number when an effective/virtual address comes. In the mmq_tlb_ctl.vhdl, I find the logic related to the set number calculation as shown below.

size_1G_hashed_addr(6) <=  tlb_tag0_q(33) xor tlb_tag0_q(tagpos_pid+pid_width-1);
size_1G_hashed_addr(5) <=  tlb_tag0_q(32) xor tlb_tag0_q(tagpos_pid+pid_width-2);
size_1G_hashed_addr(4) <=  tlb_tag0_q(31) xor tlb_tag0_q(tagpos_pid+pid_width-3);
size_1G_hashed_addr(3) <=  tlb_tag0_q(30) xor tlb_tag0_q(tagpos_pid+pid_width-4);
size_1G_hashed_addr(2) <=  tlb_tag0_q(29) xor tlb_tag0_q(tagpos_pid+pid_width-5);
size_1G_hashed_addr(1) <=  tlb_tag0_q(28) xor tlb_tag0_q(tagpos_pid+pid_width-6);
size_1G_hashed_addr(0) <=  tlb_tag0_q(27) xor tlb_tag0_q(tagpos_pid+pid_width-7);
size_1G_hashed_tid0_addr(6) <=  tlb_tag0_q(33);
size_1G_hashed_tid0_addr(5) <=  tlb_tag0_q(32);
size_1G_hashed_tid0_addr(4) <=  tlb_tag0_q(31);
size_1G_hashed_tid0_addr(3) <=  tlb_tag0_q(30);
size_1G_hashed_tid0_addr(2) <=  tlb_tag0_q(29);
size_1G_hashed_tid0_addr(1) <=  tlb_tag0_q(28);
size_1G_hashed_tid0_addr(0) <=  tlb_tag0_q(27);
size_256M_hashed_addr(6) <=  tlb_tag0_q(35)                     xor tlb_tag0_q(tagpos_pid+pid_width-1);
size_256M_hashed_addr(5) <=  tlb_tag0_q(34)                     xor tlb_tag0_q(tagpos_pid+pid_width-2);
size_256M_hashed_addr(4) <=  tlb_tag0_q(33)                     xor tlb_tag0_q(tagpos_pid+pid_width-3);
size_256M_hashed_addr(3) <=  tlb_tag0_q(32)                     xor tlb_tag0_q(tagpos_pid+pid_width-4);
size_256M_hashed_addr(2) <=  tlb_tag0_q(31)                     xor tlb_tag0_q(tagpos_pid+pid_width-5);
size_256M_hashed_addr(1) <=  tlb_tag0_q(30) xor tlb_tag0_q(28) xor tlb_tag0_q(tagpos_pid+pid_width-6);
size_256M_hashed_addr(0) <=  tlb_tag0_q(29) xor tlb_tag0_q(27) xor tlb_tag0_q(tagpos_pid+pid_width-7);
size_256M_hashed_tid0_addr(6) <=  tlb_tag0_q(35);
size_256M_hashed_tid0_addr(5) <=  tlb_tag0_q(34);
size_256M_hashed_tid0_addr(4) <=  tlb_tag0_q(33);
size_256M_hashed_tid0_addr(3) <=  tlb_tag0_q(32);
size_256M_hashed_tid0_addr(2) <=  tlb_tag0_q(31);
size_256M_hashed_tid0_addr(1) <=  tlb_tag0_q(30) xor tlb_tag0_q(28);
size_256M_hashed_tid0_addr(0) <=  tlb_tag0_q(29) xor tlb_tag0_q(27);
size_16M_hashed_addr(6) <=  tlb_tag0_q(39)                     xor tlb_tag0_q(tagpos_pid+pid_width-1);
size_16M_hashed_addr(5) <=  tlb_tag0_q(38)                     xor tlb_tag0_q(tagpos_pid+pid_width-2);
size_16M_hashed_addr(4) <=  tlb_tag0_q(37)                     xor tlb_tag0_q(tagpos_pid+pid_width-3);
size_16M_hashed_addr(3) <=  tlb_tag0_q(36) xor tlb_tag0_q(32) xor tlb_tag0_q(tagpos_pid+pid_width-4);
size_16M_hashed_addr(2) <=  tlb_tag0_q(35) xor tlb_tag0_q(31) xor tlb_tag0_q(tagpos_pid+pid_width-5);
size_16M_hashed_addr(1) <=  tlb_tag0_q(34) xor tlb_tag0_q(30) xor tlb_tag0_q(tagpos_pid+pid_width-6);
size_16M_hashed_addr(0) <=  tlb_tag0_q(33) xor tlb_tag0_q(29) xor tlb_tag0_q(tagpos_pid+pid_width-7);
size_16M_hashed_tid0_addr(6) <=  tlb_tag0_q(39);
size_16M_hashed_tid0_addr(5) <=  tlb_tag0_q(38);
size_16M_hashed_tid0_addr(4) <=  tlb_tag0_q(37);
size_16M_hashed_tid0_addr(3) <=  tlb_tag0_q(36) xor tlb_tag0_q(32);
size_16M_hashed_tid0_addr(2) <=  tlb_tag0_q(35) xor tlb_tag0_q(31);
size_16M_hashed_tid0_addr(1) <=  tlb_tag0_q(34) xor tlb_tag0_q(30);
size_16M_hashed_tid0_addr(0) <=  tlb_tag0_q(33) xor tlb_tag0_q(29);
size_1M_hashed_addr(6) <=  tlb_tag0_q(43) xor tlb_tag0_q(36) xor tlb_tag0_q(tagpos_pid+pid_width-1);
size_1M_hashed_addr(5) <=  tlb_tag0_q(42) xor tlb_tag0_q(35) xor tlb_tag0_q(tagpos_pid+pid_width-2);
size_1M_hashed_addr(4) <=  tlb_tag0_q(41) xor tlb_tag0_q(34) xor tlb_tag0_q(tagpos_pid+pid_width-3);
size_1M_hashed_addr(3) <=  tlb_tag0_q(40) xor tlb_tag0_q(33) xor tlb_tag0_q(tagpos_pid+pid_width-4);
size_1M_hashed_addr(2) <=  tlb_tag0_q(39) xor tlb_tag0_q(32) xor tlb_tag0_q(tagpos_pid+pid_width-5);
size_1M_hashed_addr(1) <=  tlb_tag0_q(38) xor tlb_tag0_q(31) xor tlb_tag0_q(tagpos_pid+pid_width-6);
size_1M_hashed_addr(0) <=  tlb_tag0_q(37) xor tlb_tag0_q(30) xor tlb_tag0_q(tagpos_pid+pid_width-7);
size_1M_hashed_tid0_addr(6) <=  tlb_tag0_q(43) xor tlb_tag0_q(36);
size_1M_hashed_tid0_addr(5) <=  tlb_tag0_q(42) xor tlb_tag0_q(35);
size_1M_hashed_tid0_addr(4) <=  tlb_tag0_q(41) xor tlb_tag0_q(34);
size_1M_hashed_tid0_addr(3) <=  tlb_tag0_q(40) xor tlb_tag0_q(33);
size_1M_hashed_tid0_addr(2) <=  tlb_tag0_q(39) xor tlb_tag0_q(32);
size_1M_hashed_tid0_addr(1) <=  tlb_tag0_q(38) xor tlb_tag0_q(31);
size_1M_hashed_tid0_addr(0) <=  tlb_tag0_q(37) xor tlb_tag0_q(30);
size_64K_hashed_addr(6) <=  tlb_tag0_q(47)                     xor tlb_tag0_q(37) xor tlb_tag0_q(tagpos_pid+pid_width-1);
size_64K_hashed_addr(5) <=  tlb_tag0_q(46)                     xor tlb_tag0_q(36) xor tlb_tag0_q(tagpos_pid+pid_width-2);
size_64K_hashed_addr(4) <=  tlb_tag0_q(45)                     xor tlb_tag0_q(35) xor tlb_tag0_q(tagpos_pid+pid_width-3);
size_64K_hashed_addr(3) <=  tlb_tag0_q(44)                     xor tlb_tag0_q(34) xor tlb_tag0_q(tagpos_pid+pid_width-4);
size_64K_hashed_addr(2) <=  tlb_tag0_q(43) xor tlb_tag0_q(40) xor tlb_tag0_q(33) xor tlb_tag0_q(tagpos_pid+pid_width-5);
size_64K_hashed_addr(1) <=  tlb_tag0_q(42) xor tlb_tag0_q(39) xor tlb_tag0_q(32) xor tlb_tag0_q(tagpos_pid+pid_width-6);
size_64K_hashed_addr(0) <=  tlb_tag0_q(41) xor tlb_tag0_q(38) xor tlb_tag0_q(31) xor tlb_tag0_q(tagpos_pid+pid_width-7);
size_64K_hashed_tid0_addr(6) <=  tlb_tag0_q(47)                     xor tlb_tag0_q(37);
size_64K_hashed_tid0_addr(5) <=  tlb_tag0_q(46)                     xor tlb_tag0_q(36);
size_64K_hashed_tid0_addr(4) <=  tlb_tag0_q(45)                     xor tlb_tag0_q(35);
size_64K_hashed_tid0_addr(3) <=  tlb_tag0_q(44)                     xor tlb_tag0_q(34);
size_64K_hashed_tid0_addr(2) <=  tlb_tag0_q(43) xor tlb_tag0_q(40) xor tlb_tag0_q(33);
size_64K_hashed_tid0_addr(1) <=  tlb_tag0_q(42) xor tlb_tag0_q(39) xor tlb_tag0_q(32);
size_64K_hashed_tid0_addr(0) <=  tlb_tag0_q(41) xor tlb_tag0_q(38) xor tlb_tag0_q(31);
size_4K_hashed_addr(6) <=  tlb_tag0_q(51) xor tlb_tag0_q(44) xor tlb_tag0_q(37) xor tlb_tag0_q(tagpos_pid+pid_width-1);
size_4K_hashed_addr(5) <=  tlb_tag0_q(50) xor tlb_tag0_q(43) xor tlb_tag0_q(36) xor tlb_tag0_q(tagpos_pid+pid_width-2);
size_4K_hashed_addr(4) <=  tlb_tag0_q(49) xor tlb_tag0_q(42) xor tlb_tag0_q(35) xor tlb_tag0_q(tagpos_pid+pid_width-3);
size_4K_hashed_addr(3) <=  tlb_tag0_q(48) xor tlb_tag0_q(41) xor tlb_tag0_q(34) xor tlb_tag0_q(tagpos_pid+pid_width-4);
size_4K_hashed_addr(2) <=  tlb_tag0_q(47) xor tlb_tag0_q(40) xor tlb_tag0_q(33) xor tlb_tag0_q(tagpos_pid+pid_width-5);
size_4K_hashed_addr(1) <=  tlb_tag0_q(46) xor tlb_tag0_q(39) xor tlb_tag0_q(32) xor tlb_tag0_q(tagpos_pid+pid_width-6);
size_4K_hashed_addr(0) <=  tlb_tag0_q(45) xor tlb_tag0_q(38) xor tlb_tag0_q(31) xor tlb_tag0_q(tagpos_pid+pid_width-7);
size_4K_hashed_tid0_addr(6) <=  tlb_tag0_q(51) xor tlb_tag0_q(44) xor tlb_tag0_q(37);
size_4K_hashed_tid0_addr(5) <=  tlb_tag0_q(50) xor tlb_tag0_q(43) xor tlb_tag0_q(36);
size_4K_hashed_tid0_addr(4) <=  tlb_tag0_q(49) xor tlb_tag0_q(42) xor tlb_tag0_q(35);
size_4K_hashed_tid0_addr(3) <=  tlb_tag0_q(48) xor tlb_tag0_q(41) xor tlb_tag0_q(34);
size_4K_hashed_tid0_addr(2) <=  tlb_tag0_q(47) xor tlb_tag0_q(40) xor tlb_tag0_q(33);
size_4K_hashed_tid0_addr(1) <=  tlb_tag0_q(46) xor tlb_tag0_q(39) xor tlb_tag0_q(32);
size_4K_hashed_tid0_addr(0) <=  tlb_tag0_q(45) xor tlb_tag0_q(38) xor tlb_tag0_q(31);

It seems to me that the set number is calcuated by using some hash functions for different page size, respectively. Based on my knowledge, the normal virtual address to physical address translation is first calculating the set number and then compare the tags of the 4 ways in the right set in order to find the right way and then do the translation in the end.

If this is the translation process that A2I core follows, my question is when an effective address comes, how can MMU know which page size this address is using?

openpowerwtf commented 3 years ago

Isn't that the selection logic immediately following what you listed? Addresses are calculated for the sequencer based on indirect/direct (mmucr2) bits.

Chapter 6 of the UM is pretty detailed; over 100 pages of MMU 😴. The hash is here:

The TLB 7-bit congruence class hash function is shown in Table 6-8. The table describes how each individual TLB index bit (that is, congruence class address bit) is formed by XORing different sets of EPN bits (and possibly PID bits) based on the page size.

luffygood commented 3 years ago

If I understand correclty, A2I core tries all potential page size listed in mmucr2 in order to find the right entry in the TLB, right? Previously, I thought A2I used some fancy ways to directly identify the right page size based on the virtual address.

openpowerwtf commented 3 years ago

Sounds right; you control the direct search order using mmucr2. If that fails, the indirect searches are tried.

6.16.1 Searching the TLB for Direct and Indirect Entries A direct (IND = 0) or an indirect TLB entry (IND = 1) matches the virtual address if all fields match per Section 6.2.4 TLB Match Process. The TLB is searched for matching direct entries first according to the page size order dictated by MMUCR2. If no matching direct TLB entries are found, the TLB is then searched for 1 MB indirect entries, followed by 256 MB indirect entries. If a valid 1 MB indirect entry is found, the search process is discontinued. If there is one and only one matching indirect entry in the associated TLB congruence class being searched, the indirect entry is used to access a page table entry (PTE). If the PTE is valid (V bit equals “1”), the PTE is installed in the TLB and used to translate the virtual address. The PTE entry format is described in Section 6.16.3 Hardware Page Table Entry Format. The abbreviated real page number (ARPN) from the PTE is treated as a logical page number (LPN), and this LPN is subsequently translated by the LRAT into an RPN before being installed into the TLB. If there is more than one matching direct TLB entry or more than one matching indirect TLB entry for any calculated TLB congruence class, a machine check exception is generated.

luffygood commented 3 years ago

Thanks. And another question is how to find the correct set of the TLB when an effective/virtual address comes in the mmq_tlb_ctl.vhdl. It's hard to find the signal which is used to show the set of TLB. In the mmq_tlb_ctl.vhdl, the signal tlb_addr_d have 7 bits(equal to log128). I wonder whether it expresses the set of the TLB?

tlb_addr_d  <=  (others => '0') when tlb_seq_addr_clr='1'
           else tlb_addr_p1 when tlb_seq_addr_incr='1'
           else tlb_seq_addr when tlb_seq_addr_update='1'  
           else tlb_addr_q;
openpowerwtf commented 3 years ago

Yes, I saw that equation too when I was looking at your original question.

In mmq_tlb_ctl, tlb_addr is an output driven by tlb_addr_q. In mmq.vhdl, tlb_addr connects from tlb_ctrl to tlb_array[0:3] (the tri_128x168_1w_0 arrays in tri library).

luffygood commented 3 years ago

Thanks. And now I have trublue understanding how to judge the correct pagesize when an effective/virtual address comes. In the mmq_tlb_ctl.vhdl, the pagesize the effective/virtual address uses can not be directly gotten. Under this condition, how can we chose the right set of TLB? Below is my understanding. I want to check wehther it's right or not. When an effective/virtual address comes, we can not get the pagesize it uses. But we all know that its pagesize can just be one of the five pagesizes(4k,64k,1M,16M,.1G),and 256M when it's indirect searching. So the mmq_tlb_ctl gets ten mutually independent signals using the function hash in table6-8(considering the TID).

size_1G_hashed_addr,size_1G_hashed_tid0_addr
size_256M_hashed_addr,size_256M_hashed_tid0_addr(when the tlb_seq_ind = 1)
size_16M_hashed_addr,size_16M_hashed_tid0_addr
size_1M_hashed_addr,size_1M_hashed_tid0_addr
size_64K_hashed_addr,size_64K_hashed_tid0_addr
size_4K_hashed_addr,size_4K_hashed_tid0_addr

By using the signals above and the register MMUCR2, we can get another 12 signals to show the set of TLB.

tlb_tag0_hashed_addr  <=  size_1G_hashed_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_1GB
               else size_256M_hashed_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_256MB
               else size_16M_hashed_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_16MB
               else size_1M_hashed_addr  when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_1MB
               else size_64K_hashed_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_64KB
               else size_4K_hashed_addr;
tlb_tag0_hashed_tid0_addr  <=  size_1G_hashed_tid0_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_1GB
                   else size_256M_hashed_tid0_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_256MB
                   else size_16M_hashed_tid0_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_16MB
                   else size_1M_hashed_tid0_addr  when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_1MB
                   else size_64K_hashed_tid0_addr when tlb_tag0_q(tagpos_size to tagpos_size+3)=TLB_PgSize_64KB
                   else size_4K_hashed_tid0_addr;
tlb_hashed_addr1  <=  size_1G_hashed_addr when mmucr2(28 to 31)=TLB_PgSize_1GB
               else size_16M_hashed_addr when mmucr2(28 to 31)=TLB_PgSize_16MB
               else size_1M_hashed_addr  when mmucr2(28 to 31)=TLB_PgSize_1MB
               else size_64K_hashed_addr when mmucr2(28 to 31)=TLB_PgSize_64KB
               else size_4K_hashed_addr;
tlb_hashed_tid0_addr1  <=  size_1G_hashed_tid0_addr when mmucr2(28 to 31)=TLB_PgSize_1GB
                   else size_16M_hashed_tid0_addr when mmucr2(28 to 31)=TLB_PgSize_16MB
                   else size_1M_hashed_tid0_addr  when mmucr2(28 to 31)=TLB_PgSize_1MB
                   else size_64K_hashed_tid0_addr when mmucr2(28 to 31)=TLB_PgSize_64KB
                   else size_4K_hashed_tid0_addr;
tlb_hashed_addr2  <=  size_1G_hashed_addr when mmucr2(24 to 27)=TLB_PgSize_1GB
               else size_16M_hashed_addr when mmucr2(24 to 27)=TLB_PgSize_16MB
               else size_1M_hashed_addr  when mmucr2(24 to 27)=TLB_PgSize_1MB
               else size_64K_hashed_addr when mmucr2(24 to 27)=TLB_PgSize_64KB
               else size_4K_hashed_addr;
tlb_hashed_tid0_addr2  <=  size_1G_hashed_tid0_addr when mmucr2(24 to 27)=TLB_PgSize_1GB
                   else size_16M_hashed_tid0_addr when mmucr2(24 to 27)=TLB_PgSize_16MB
                   else size_1M_hashed_tid0_addr  when mmucr2(24 to 27)=TLB_PgSize_1MB
                   else size_64K_hashed_tid0_addr when mmucr2(24 to 27)=TLB_PgSize_64KB
                   else size_4K_hashed_tid0_addr;
tlb_hashed_addr3  <=  size_1G_hashed_addr when mmucr2(20 to 23)=TLB_PgSize_1GB
               else size_16M_hashed_addr when mmucr2(20 to 23)=TLB_PgSize_16MB
               else size_1M_hashed_addr  when mmucr2(20 to 23)=TLB_PgSize_1MB
               else size_64K_hashed_addr when mmucr2(20 to 23)=TLB_PgSize_64KB
               else size_4K_hashed_addr;
tlb_hashed_tid0_addr3  <=  size_1G_hashed_tid0_addr when mmucr2(20 to 23)=TLB_PgSize_1GB
                   else size_16M_hashed_tid0_addr when mmucr2(20 to 23)=TLB_PgSize_16MB
                   else size_1M_hashed_tid0_addr  when mmucr2(20 to 23)=TLB_PgSize_1MB
                   else size_64K_hashed_tid0_addr when mmucr2(20 to 23)=TLB_PgSize_64KB
                   else size_4K_hashed_tid0_addr;
tlb_hashed_addr4  <=  size_1G_hashed_addr when mmucr2(16 to 19)=TLB_PgSize_1GB
               else size_16M_hashed_addr when mmucr2(16 to 19)=TLB_PgSize_16MB
               else size_1M_hashed_addr  when mmucr2(16 to 19)=TLB_PgSize_1MB
               else size_64K_hashed_addr when mmucr2(16 to 19)=TLB_PgSize_64KB
               else size_4K_hashed_addr;
tlb_hashed_tid0_addr4  <=  size_1G_hashed_tid0_addr when mmucr2(16 to 19)=TLB_PgSize_1GB
                   else size_16M_hashed_tid0_addr when mmucr2(16 to 19)=TLB_PgSize_16MB
                   else size_1M_hashed_tid0_addr  when mmucr2(16 to 19)=TLB_PgSize_1MB
                   else size_64K_hashed_tid0_addr when mmucr2(16 to 19)=TLB_PgSize_64KB
                   else size_4K_hashed_tid0_addr;
tlb_hashed_addr5  <=  size_1G_hashed_addr when mmucr2(12 to 15)=TLB_PgSize_1GB
               else size_16M_hashed_addr when mmucr2(12 to 15)=TLB_PgSize_16MB
               else size_1M_hashed_addr  when mmucr2(12 to 15)=TLB_PgSize_1MB
               else size_64K_hashed_addr when mmucr2(12 to 15)=TLB_PgSize_64KB
               else size_4K_hashed_addr;
tlb_hashed_tid0_addr5  <=  size_1G_hashed_tid0_addr when mmucr2(12 to 15)=TLB_PgSize_1GB
                   else size_16M_hashed_tid0_addr when mmucr2(12 to 15)=TLB_PgSize_16MB
                   else size_1M_hashed_tid0_addr  when mmucr2(12 to 15)=TLB_PgSize_1MB
                   else size_64K_hashed_tid0_addr when mmucr2(12 to 15)=TLB_PgSize_64KB
                   else size_4K_hashed_tid0_addr;

Then, by using the state machine, we can get a correct tlb_seq_addr for the effective/virtual address under different conditions.

        WHEN TlbSeq_Stg1 =>
          tlb_seq_tag0_addr_cap <= '1';  
          tlb_seq_addr_update <= '1';  
          tlb_seq_addr <= tlb_hashed_addr1;  
          tlb_seq_pgsize <= mmucr2(28 to 31); 
          tlb_seq_is <= "00";  
          tlb_seq_esel <= "001";  
          if pgsize2_valid='1' then
           tlb_seq_next <=  TlbSeq_Stg2;
          else 
           tlb_seq_next <=  TlbSeq_Stg6;
          end if;
          ......

That means that, although I don't know the pagesize the effective/virtual address uses, I can get all the possible sets of TLB related to the pagesize. After that, in the state machine, we can get one and only one correct tlb_seq_addr to show the set of TLB. Is my understanding right?

openpowerwtf commented 3 years ago

Yes, that sounds right. By definition, there must be a single correct result (hit one, or miss), or a programmer messed up yet again 🙄 :

If there is more than one matching direct TLB entry or more than one matching indirect TLB entry for any calculated TLB congruence class, a machine check exception is generated.

FYI - this is the 2.07 ISA; A2I implemented 2.06, but I think the relevant sections are the same: Power 2.07B Book III-e 6.7 is the translation definition.

Also, for speed/size/flexibility,

  1. As the hardware designer, you can change to allow fewer/more page sizes.
  2. As the OS designer, you can choose from any available scheme provided by hardware.

And don't forget @luffygood, this translation mechanism will need to be replaced by the the radix scheme for compliancy. So you have something new to learn AND implement. 👍

luffygood commented 3 years ago

Thanks, sincerely. Now, I clearly understand the main step to find the set of the TLB. But a new question comes. We can get mutually independent signals using the function hash in table6-8(considering the TID).

size_1G_hashed_addr,size_1G_hashed_tid0_addr
size_256M_hashed_addr,size_256M_hashed_tid0_addr(when the tlb_seq_ind = 1)
size_16M_hashed_addr,size_16M_hashed_tid0_addr
size_1M_hashed_addr,size_1M_hashed_tid0_addr
size_64K_hashed_addr,size_64K_hashed_tid0_addr
size_4K_hashed_addr,size_4K_hashed_tid0_addr

tlb_tag0_hashed_addr  tlb_tag0_hashed_tid0_addr
tlb_hashed_addr1  tlb_hashed_tid0_addr1
tlb_hashed_addr2  tlb_hashed_tid0_addr2
tlb_hashed_addr3  tlb_hashed_tid0_addr3
tlb_hashed_addr4  tlb_hashed_tid0_addr4 
tlb_hashed_addr5  tlb_hashed_tid0_addr5

But how do I know whether the TID is equal to 0 or 1? And the function hash is different because of the TID. These are my two conjectures: 1.The TID is gotten by the input signal. Before we use the state machine, we can already get rid of some signals above we get. But I cannot find the input signal which directly express the TID. The question is how to get the TID before we use the state machine. 2.In the mmq_tlb_ctl, we cannot get the TID by the inputs. So we calculate all possible signals decided by the TID(0 or 1). And then, we use the state machine to get the correct set of TLB by using all the signals we get. Which one is correct in the two conjectures? If the first conjecture is ture, how can I get the TID?

openpowerwtf commented 3 years ago

I think (2); the lookup (sequencer + compare + TLB contents) resolves the match unambiguously.

Thread id is supplied with the request, and is part of the match. But 'tid0' in the hash addr signal names refers to TLB[tid]=0, not thread id=0. The match logic checks the selected set and accounts for TLB[tid] as in Figure 6-1, and supplies the compare result to the sequencer. And there can be at most one match.

Two requirements in 6.2.4 TLB Match Process:

The thread ID (ThdID) field of the TLB entry has the bit corresponding to the issuing hardware thread set to 1

Either the value of the process identifier is equal to the value of the TID field of the TLB entry (private page) or the value of the TID field is 0 (globally shared page)

luffygood commented 3 years ago

Thanks. As we know, when the effective/virtual address comes, we can use the state machine to find the correct set of the TLB. However, there may exist TLB miss or other errors and then we cannot get the set of TLB. If we are facing the situation above, in the state machine, which state can handle this question? In simple terms, what will the mmq_tlb_ctl do when it can't get the set of TLB by the effective/virtual address?

openpowerwtf commented 3 years ago

The ending states are the ones that return to Idle/set xx_done_sig: 23, 29, 30, 31,(32)

A TLB miss exception occurs if there is no matching direct or indirect entry in the TLB for the page specified by the virtual address (except for the tlbsx[.] instruction, which simply returns certain default or undefined values to the MMU Assist Registers (MAS), and for tlbsx., which sets CR[CR0]2 to 0). See TLB Search Instruction (tlbsx[.]) on page 215.