hoglet67 / PiTubeDirect

Bare-metal Raspberry Pi project that attaches to the Acorn TUBE interface and emulates many BBC Micro Co Processors
GNU General Public License v3.0
187 stars 23 forks source link

Co Pro Life issue with the Pi 4 #150

Closed hoglet67 closed 2 years ago

hoglet67 commented 2 years ago

The "large memory" version of Co Pro Life (*MLIFE) that uses the bank select register doesn't work on the Pi 4.

The failure is that pattern (even a small one) just stays fixed.

Here's a quick test matrix:

Pi Zero 2/3A+:

Pi 4:

The failure on all cases on Co Pro 16 is expected (?) because that Co Pro uses tube_parasite_write() rather than tube_parasite_write_banksel(), so the bank select registers are not present. It's possible this is deliberate; I don't recall.

The same failure happens on the Pi 4 on all of the 6502 Co Pro types; this suggests mapping of the banksel registers is somehow broken on the Pi 4.

This might relate in some way to #149 - but it affects all 6502 Co Pro types.

dp111 commented 2 years ago

I wonder if after map_4k_page we need a data cache flush.

On Wed, 5 Jan 2022 at 17:43, David Banks @.***> wrote:

The "large memory" version of Co Pro Life (*MLIFE) that uses the bank select register doesn't work on the Pi 4.

The failure is that pattern (even a small one) just stays fixed.

Here's a quick test matrix:

Pi Zero 2:

  • Co Pro 0 (Fast 6502) - Works
  • Co Pro 16 (Lib 6502) - Doesn't work (pattern stays fixed)
  • Co Pro 24 (JIT 6502) - Works

Pi 4:

  • Co Pro 0 (Fast 6502) - Doesn't work (pattern stays fixed)
  • Co Pro 16 (Lib 6502) - Doesn't work (pattern stays fixed)
  • Co Pro 24 (JIT 6502) - Doesn't work (pattern stays fixed)

The failure on all cases on Co Pro 16 is expected (?) because that Co Pro uses tube_parasite_write() rather than tube_parasite_write_banksel(), so the bank select registers are not present. It's possible this is deliberate; I don't recall.

The same failure happens on the Pi 4 on all of the 6502 Co Pro types; this suggests mapping of the banksel registers is somehow broken on the Pi 4.

— Reply to this email directly, view it on GitHub https://github.com/hoglet67/PiTubeDirect/issues/150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVVFIVODQNNOYU4DXP5TK3UUR7NHANCNFSM5LKO42HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

hoglet67 commented 2 years ago

Hmmm, this is proving tricky to track down....

I've tried everything I can think of:

void map_4k_page(unsigned int logical, unsigned int physical) {
  // Clean/Invalidate the existing 4K page
  _clean_cache_area((void *)(logical * 4096), 4096);
  // Invalidate it again to be really sure (really should not be needed)
  _invalidate_cache_area((void *)(logical * 4096), 4096);

  // Invalidate the data TLB before changing mapping
  //_invalidate_dtlb_mva((void *)(logical << 12));

  // Invalidate the whole data TLB !!! 
  _invalidate_dtlb();

  // Setup the 4K page table entry
  // Second level descriptors use extended small page format so inner/outer caching can be controlled
  // Pi 0/1:
  //   XP (bit 23) in SCTRL is 0 so descriptors use ARMv4/5 backwards compatible format
  // Pi 2/3:
  //   XP (bit 23) in SCTRL no longer exists, and we see to be using ARMv6 table formats
  //   this means bit 0 of the page table is actually XN and must be clear to allow native ARM code to execute
  //   (this was the cause of issue #27)
#if defined(RPI2) || defined (RPI3) || defined(RPI4)
  PageTable2[logical] = (physical<<12) | 0x132u | (bb << 6) | (aa << 2);
#else
  PageTable2[logical] = (physical<<12) | 0x133u | (bb << 6) | (aa << 2);
#endif

  // Clean/invalidate the new page table entry

  _clean_cache_area((void *)(PageTable2 + logical), 4);

  // Invalidate the 4K page yet again
  _invalidate_cache_area((void *)(logical * 4096), 4096);
}

I've verified the map_4k_page calls are being made; they just don't seem to be having much effect on the memory that is seen by a Basic program.

dp111 commented 2 years ago

I don't know if page 13 is relevant here https://developer.arm.com/documentation/epm012079/latest ?

On Thu, 6 Jan 2022 at 13:01, David Banks @.***> wrote:

Hmmm, this is proving tricky to track down....

I've tried everything I can think of:

void map_4k_page(unsigned int logical, unsigned int physical) { // Clean/Invalidate the existing 4K page _clean_cache_area((void )(logical 4096), 4096); // Invalidate it again to be really sure (really should not be needed) _invalidate_cache_area((void )(logical 4096), 4096);

// Invalidate the data TLB before changing mapping //_invalidate_dtlb_mva((void *)(logical << 12));

// Invalidate the whole data TLB !!! _invalidate_dtlb();

// Setup the 4K page table entry // Second level descriptors use extended small page format so inner/outer caching can be controlled // Pi 0/1: // XP (bit 23) in SCTRL is 0 so descriptors use ARMv4/5 backwards compatible format // Pi 2/3: // XP (bit 23) in SCTRL no longer exists, and we see to be using ARMv6 table formats // this means bit 0 of the page table is actually XN and must be clear to allow native ARM code to execute // (this was the cause of issue #27)

if defined(RPI2) || defined (RPI3) || defined(RPI4)

PageTable2[logical] = (physical<<12) | 0x132u | (bb << 6) | (aa << 2);

else

PageTable2[logical] = (physical<<12) | 0x133u | (bb << 6) | (aa << 2);

endif

// Clean/invalidate the new page table entry

_clean_cache_area((void *)(PageTable2 + logical), 4);

// Invalidate the 4K page yet again _invalidate_cache_area((void )(logical 4096), 4096); }

I've verified the map_4k_page calls are being made; they just don't seem to be having much effect on the memory that is seen by a Basic program.

— Reply to this email directly, view it on GitHub https://github.com/hoglet67/PiTubeDirect/issues/150#issuecomment-1006571702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVVFISM6NPOAIX4Q3TXEGLUUWHDLANCNFSM5LKO42HQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

hoglet67 commented 2 years ago

I don't know if page 13 is relevant here

Possibly, but I tried adding dmb AND isb after we invalidate the TLB (in map_4k_page) and it make no difference.

I've written a small test program in BASIC: capture40

Here's the expected result of two runs after power up. capture34 capture35

The 55555555 values are just what happens to be in memory at 256K on power up/

The 11223344 value is written into the default bank so it's easily identifieable.

This is indeed what you see on a Pi Zero 2 in both Co Pro 0 and Co Pro 24.

On the Pi 4, on Co Pro 0, the results look like: capture36 capture37

So no evidence of banks switching occurring.

On the Pi 4, on Co Pro 24, the results look like: capture38 capture39

There is some evidence of bank switching, but it's very delayed/eratic.

This is very likely to be data cache related, and is either: 1) The first bank that is read is cached, and remains in the data cache when the page table is changed 2) The page table is in the data cache, and when a page fault happens stale data from memory is read

Domimic. a couple of questions for you:

Concerning (1), it does seem like we need to clean/invalidate the 8K bank in memory, then wait fot this to be done (with a barrier) before we update the page table. We don't currently do that.

Concerning (2), is this possible? Surely the page table walk is happening "through" the data cache? It did occur to me that this might be configurable, and be different on the Pi 4. This does indeed seem to be configurable: https://github.com/hoglet67/PiTubeDirect/blob/hognose-dev/src/cache.c#L321 We need to find the documentation for TTBR0 in the Cortex A72 Technical Manual: https://documentation-service.arm.com/static/5e7b6c287158f500bd5bfb61?token=

hoglet67 commented 2 years ago

According to the Cortex A72 technical manual (6-285 in the above link), the data cache is PIPT (physically indexed, physically tagged), so I think that means that we don't have to worry about invalidating the 8K page in the data cache.

The Cortex A53 data cache is also PIPT

The ARM1176 data cache is VIPT, so possible (1) is an issue there?

hoglet67 commented 2 years ago

DATA SYNCHRONIZATION BARRIER !!!

I suspect this waits for the TLB invalidate to complete before continuining.

Not quite why this is so critical...

hoglet67 commented 2 years ago

So one further comment on this....

It was the errata that Dominic linked to that highlighted the problem.

Specifically, it seems it is a ARM architectural requirement to perform a DSB after a TLB invalidate, otherwise the TLB invalidate might never be seen.

The Arm architecture states that a separate observer might observe a write to the translation tables at any time after the execution of the instruction that performed that write. However, it also states that the write is only guaranteed to be observable after the execution of a DSB instruction by the PE that executed the instruction that performed the write to the translation tables.

(The errata further suggests an ISB might be necessary as well; that doesn't seem to be the case in our usage of this, at least not currently)

But on the Pi 4 a DSB is needed, a DMB and/or ISB is not sufficient.

I wonder if this affects anything else we are doing?

Also, I suspect we should be invalidating the TLB after updating the page table, not before????

hoglet67 commented 2 years ago

The minimum change to make the TLB change visible seems to be:

DSB ISH

where ISH stands for "Inner Shared Domain"

dp111 commented 2 years ago

I think this is a more generic version which matches the ARMv8 documentation and should support executing code in the future. commit id : 5fd3e1105a5949a3c425b12471aa77f33196c5a5 and 64ad96c7bc97c2cc66e956b1d4e28832674361b5