jart / blink

tiniest x86-64-linux emulator
ISC License
6.96k stars 222 forks source link

Allow alternate BIOS via `-B` option; make BIOS ROM area read-only #128

Closed tkchia closed 1 year ago

ghaerr commented 1 year ago

@tkchia: LGTM. Is the reason that memalign and free are replaced with mmap and munmap so that mprotect can be used on page-aligned real mode memory for the BIOS ROM region? (I am guessing your test programs showed that earlier, mprotect was failing but we didn't know it)?

tkchia commented 1 year ago

Hello @ghaerr,

Is the reason that memalign and free are replaced with mmap and munmap so that mprotect can be used on page-aligned real mode memory for the BIOS ROM region?

That is my intention. Admittedly though, the code probably needs more testing on platforms where the page size may be larger or even smaller than 4,096.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

When running on macOS after this commit, blink gets a segmentation fault when booting up and running the FreeDOS 1.3 installer (the one that you uploaded to Discord). I'm not sure if this has to do with the macOS pagesize or not.

Here's the attached screenshot showing when it happens:

blink segfault

Thank you!

ghaerr commented 1 year ago

@tkchia:

If I comment out the Mprotect call in blink/rombios.c, the problem does not occur:

diff --git a/blink/biosrom.c b/blink/biosrom.c
index 2c768df..a4912d4 100644
--- a/blink/biosrom.c
+++ b/blink/biosrom.c
@@ -172,8 +172,8 @@ void LoadBios(struct Machine *m, const char *biosprog) {
   protstart = ROUNDUP(kBiosOptBase, FLAG_pagesize);
   protend = ROUNDDOWN(kBiosEnd, FLAG_pagesize);
   if (protstart < protend) {
-    Mprotect(m->system->real + protstart, protend - protstart, PROT_READ,
-             "bios");
+    //Mprotect(m->system->real + protstart, protend - protstart, PROT_READ,
+             //"bios");
tkchia commented 1 year ago

Hello @ghaerr,

Interesting. It seems that something inside the FreeDOS installer was trying to scribble bytes into the BIOS ROM area (but why)? I should look into this further to figure out the correct way to fix this.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

It seems that something inside the FreeDOS installer was trying to scribble bytes into the BIOS ROM area (but why)?

I see. Do you know what the 00000050 is referencing in the modal dialog?

As far as scribbling bytes I guess I'm not surprised at what I've already seen, but perhaps it's trying its own way of determining the top of usable memory? If so, we'll probably be forced to your previous recommendation of ignoring writes, rather than prohibiting them (somehow).

I'm not sure how easy it might be to add a "Watch range" to blink rather than a single Watch address, but that could help track this down.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

The instructions in the disassembly seem to come from an earlier version of the vinfo.com tool used by the FreeDOS 1.3 installer. The relevant logic is now disabled in the Git main branch:

NotVBOX:
    mov         si, VMString
    mov         bx, 0xf000
    mov         es, bx
    mov         cx, 0xf000
    call        SearchString
    jnc         NotVMware
    mov         ax, 104
    jmp         %%DetectDone

NotVMware:

;OtherEmulation:
;    cld
;    mov         al, 0x90
;    mov         cx, 0x0008
;    mov         di, $
;    rep         stosb
;    jmp         VirtualMachine
;    nop
;    nop

DetectCPU:
    ; Test Pre-186, CL is AND with 0x0f prior to/during shr
    mov         cl, 0x20
    mov         ax, 0x0001
    shr         ax, cl
    cmp         ax, 0x0000
    jne         .186OrBetter

So it looks like it should be OK to ignore and discard the writes to the ROM area in this case. The problem is figuring out how. :neutral_face: (Perhaps one simplistic way for now is to check for the special case of a SIGSEGV or SIGBUS or similar that is triggered by a rep stosb or stosb in the guest.)

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

I think the original idea was to actually scribble the nop bytes (0x90) into the vinfo code segment, and not the ROM segment. The intent would be to distinguish between an actual physical x86 system, and an as-yet-unknown type of emulator. And somehow the whole code got garbled and cargo-culted.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

I think the original idea was to actually scribble the nop bytes (0x90) into the vinfo code segment, and not the ROM segment.

I think you're right, the mov di, $ would seem to indicate that, had the programmer not forgotten to load ES with CS. (Actually, I have been in the V8 Power Tools code before - when chasing down a previous problem of VFRAME.COM not working properly on blink. Frankly, I have found that entire codebase unreasonably overcomplicated and hard to follow).

However, even if the code had been written without error and the jmp VirtualMachine path not taken, that would imply that an emulator would have the ability to write-protect-but-without-error a likely non-page-aligned byte range corresponding to the vinfo.com file's load address. blink16 does implement a mechanism that allows for any bytes of the 1M real mode address space to be checked for read,write,both or neither allowed, but produces an error and stops the emulator. This is used and useful when emulating .text, .data and stack segments in .com/.exe/a.out file execution only, and it was found that it was problematic to turn on memory checking when emulating an OS. It would be simple to add a write-ignore flag.

So it looks like it should be OK to ignore and discard the writes to the ROM area in this case.

I would agree that having the capability to check read/write access of VM memory would be a feature that could be used to better debug programs being emulated. However, blink doesn't currently allow arbitrary byte ranges to be checked, which would be most useful. In the current case, the mechanism relies on a page fault and signal handler callback for every byte failed, lots of work and slow emulation for something that we might want to consider another implementation, and it's not very scalable. For OS emulation only, the feature is probably best left turned off or just removed.

Perhaps one simplistic way for now is to check for the special case of a SIGSEGV

The current feature is of quite limited capability, given that its only use is on page-aligned memory with callback handling byte ranges or flags not supported by mmap. Thus, my recommendation would be to keep this "feature" turned off for now, and not write special signal handling code to allow continued execution. If such a feature is truly important, perhaps implementing something like was done in blink16, where a set of 8 bits of ram shadow flags can be defined for each byte of RAM, allowing read, write, ignore-write, etc to be accommodated for precise debugging capabilities or emulation.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

I have a patch in the works that modifies Blink's AccessRam(...) routine so that it can redirect writes to the ROM area elsewhere. Part of AccessRam(...)'s role is to handle access of memory operands which may straddle page (more precisely, 4,096-byte chunk) boundaries in a virtual memory setting. I hope to add a test to ensure that the patch is working.

Then again, it is rare for programs to actually want to write to ROM — such programs are probably either buggy (e.g. vinfo.com), or are up to some obfuscated shenanigans. Perhaps I should try to push my patch as an optional feature that can be disabled by ./configure.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

I have a patch in the works that modifies Blink's AccessRam(...) routine

That sounds promising. I hadn't realized that all VM ram access went through a single routine. blink16 is able to do the same thing through physicalAddress, where the shadow ram flags are inspected.

so that it can redirect writes to the ROM area elsewhere

I see. For blink then, AccessRam can't actually ignore the write in < 1M, but would return some address where the write would be ignored?

I think your proposed AccessRam patch is a better solution than mprotect and signal handler code. Hopefully such a feature doesn't slow down overall blink execution speed by introducing a bottleneck which is seldom useful.

Then again, it is rare for programs to actually want to write to ROM — such programs are probably either buggy (e.g. vinfo.com), or are up to some obfuscated shenanigans.

Definitely agree there. Adding the ability to restrict ROM write access is of quite limited usefulness, unless one just wants to become informed of where the buggy programs are out there (but still likely requires them to run as written).

Perhaps I should try to push my patch as an optional feature that can be disabled by ./configure.

Perhaps a command line option to enable it might be more useful. However, who would really use this feature when enabled? Anyone emulating FreeDOS with blink will just disable it, so is there really a point in implementing such a small feature now? I'm fine with with whichever way you decide to go, but think the feature should default off. At some later point, should blink ever get upstreamed with the blink16 capabilities of running DOS .com/.exe and/or ELKS a.out files, this kind of feature would be much more useful, as it could point out stack or heap overflow and much more common bad scenarios.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

OK... here is the patch; I think I will just leave it here for now, and perhaps submit it if and when it turns out to be more useful:

diff --git a/blink/biosrom.c b/blink/biosrom.c
index 2c768df..70a9178 100644
--- a/blink/biosrom.c
+++ b/blink/biosrom.c
@@ -32,6 +32,7 @@
 #include <string.h>

 #include "blink/bda.h"
+#include "blink/builtin.h"
 #include "blink/endian.h"
 #include "blink/macros.h"
 #include "blink/map.h"
@@ -167,14 +168,17 @@ void LoadBios(struct Machine *m, const char *biosprog) {
     size = sizeof(kDefBios);
     memcpy(m->system->real + kBiosEnd - size, kDefBios, size);
   }
+#ifndef DISABLE_ROM
   // try to protect the BIOS ROM area
-  // TODO: ideally writes will seem to succeed but have no effect
+  // BeginStore() & EndStore() will avoid scribbling in this area of
+  // memory: so writes to the guest ROM area are effectively ignored
   protstart = ROUNDUP(kBiosOptBase, FLAG_pagesize);
   protend = ROUNDDOWN(kBiosEnd, FLAG_pagesize);
   if (protstart < protend) {
     Mprotect(m->system->real + protstart, protend - protstart, PROT_READ,
              "bios");
   }
+#endif
   // load %cs:%rip
   m->cs.sel = kBiosSeg;
   m->cs.base = kBiosBase;
diff --git a/blink/machine.h b/blink/machine.h
index 1883202..d0d8f09 100644
--- a/blink/machine.h
+++ b/blink/machine.h
@@ -500,6 +500,9 @@ int RegisterMemory(struct Machine *, i64, void *, size_t);
 i64 ConvertHostToGuestAddress(struct System *, void *, u64 *);
 u8 *GetPageAddress(struct System *, u64, bool);
 u8 *GetHostAddress(struct Machine *, u64, long);
+#ifndef DISABLE_ROM
+bool IsRomAddress(struct Machine *, u8 *);
+#endif
 u8 *AccessRam(struct Machine *, i64, size_t, void *[2], u8 *, bool);
 u8 *BeginLoadStore(struct Machine *, i64, size_t, void *[2], u8 *);
 u8 *BeginStore(struct Machine *, i64, size_t, void *[2], u8 *);
@@ -807,6 +810,12 @@ MICRO_OP_SAFE u8 Cpl(struct Machine *m) {
   return m->mode.genmode != XED_GEN_MODE_REAL ? (m->cs.sel & 3u) : 0u;
 }

+#ifdef DISABLE_ROM
+MICRO_OP_SAFE bool IsRomAddress(struct Machine *m, u8 *r) {
+  return false;
+}
+#endif
+
 #define BEGIN_NO_PAGE_FAULTS \
   {                          \
     bool nofault_;           \
diff --git a/blink/memory.c b/blink/memory.c
index 7929a8c..75f14fc 100644
--- a/blink/memory.c
+++ b/blink/memory.c
@@ -23,6 +23,7 @@
 #include <sys/mman.h>

 #include "blink/assert.h"
+#include "blink/biosrom.h"
 #include "blink/bus.h"
 #include "blink/checked.h"
 #include "blink/debug.h"
@@ -445,13 +446,31 @@ u8 *ReserveAddress(struct Machine *m, i64 v, size_t n, bool writable) {
   }
 }

-u8 *AccessRam(struct Machine *m, i64 v, size_t n, void *p[2], u8 *tmp,
-              bool copy) {
+#ifndef DISABLE_ROM
+bool IsRomAddress(struct Machine *m, u8 *r) {
+  if (m->metal) {
+    u8 *real = m->system->real;
+#if CAN_ABUSE_POINTERS
+    ptrdiff_t d = r - real;
+    if (kBiosOptBase <= d && d < kBiosEnd) return true;
+#else
+    if (real[kBiosOptBase] <= r && r < &real[kBiosEnd]) return true;
+#endif
+  }
+  return false;
+}
+#endif
+
+static u8 *AccessRam2(struct Machine *m, i64 v, size_t n, void *p[2], u8 *tmp,
+                      bool copy, bool protect_rom) {
   u8 *a, *b;
   unsigned k;
   unassert(n <= 4096);
   if ((v & 4095) + n <= 4096) {
-    return ResolveAddress(m, v);
+    a = ResolveAddress(m, v);
+    if (!protect_rom || !IsRomAddress(m, a)) return a;
+    if (copy) memcpy(tmp, a, n);
+    return tmp;
   }
   STATISTIC(++page_overlaps);
   k = 4096;
@@ -463,11 +482,20 @@ u8 *AccessRam(struct Machine *m, i64 v, size_t n, void *p[2], u8 *tmp,
     memcpy(tmp, a, k);
     memcpy(tmp + k, b, n - k);
   }
+  if (protect_rom) {
+    if (IsRomAddress(m, a)) a = NULL;
+    if (IsRomAddress(m, b)) b = NULL;
+  }
   p[0] = a;
   p[1] = b;
   return tmp;
 }

+u8 *AccessRam(struct Machine *m, i64 v, size_t n, void *p[2], u8 *tmp,
+              bool d) {
+  return AccessRam2(m, v, n, p, tmp, d, !d);
+}
+
 u8 *Load(struct Machine *m, i64 v, size_t n, u8 *b) {
   void *p[2];
   SetReadAddr(m, v, n);
@@ -484,10 +512,12 @@ u8 *BeginStoreNp(struct Machine *m, i64 v, size_t n, void *p[2], u8 *b) {
   return BeginStore(m, v, n, p, b);
 }

+#if 0
 u8 *BeginLoadStore(struct Machine *m, i64 v, size_t n, void *p[2], u8 *b) {
   SetWriteAddr(m, v, n);
-  return AccessRam(m, v, n, p, b, true);
+  return AccessRam2(m, v, n, p, b, true, true);
 }
+#endif

 void EndStore(struct Machine *m, i64 v, size_t n, void *p[2], u8 *b) {
   unsigned k;
@@ -496,10 +526,15 @@ void EndStore(struct Machine *m, i64 v, size_t n, void *p[2], u8 *b) {
   k = 4096;
   k -= v & 4095;
   unassert(n > k);
+#ifdef DISABLE_ROM
   unassert(p[0]);
   unassert(p[1]);
   memcpy(p[0], b, k);
   memcpy(p[1], b + k, n - k);
+#else
+  if (p[0]) memcpy(p[0], b, k);
+  if (p[1]) memcpy(p[1], b + k, n - k);
+#endif
 }

 void EndStoreNp(struct Machine *m, i64 v, size_t n, void *p[2], u8 *b) {
diff --git a/blink/memorymalloc.c b/blink/memorymalloc.c
index 6c6938a..f1f3ff3 100644
--- a/blink/memorymalloc.c
+++ b/blink/memorymalloc.c
@@ -1392,6 +1392,9 @@ i64 ConvertHostToGuestAddress(struct System *s, void *ha, u64 *out_pte) {
   i64 g48;
   uintptr_t base;
   if (out_pte) *out_pte = 0;
+  if (s->mode.genmode == XED_GEN_MODE_REAL || !(s->cr0 & CR0_PG)) {
+    return (uintptr_t)ha;
+  }
   if ((uintptr_t)ha < kNullSize) return (uintptr_t)ha;
   if (HasLinearMapping() && !out_pte) return ToGuest(ha);
   base = (uintptr_t)ha & -4096;
diff --git a/blink/string.c b/blink/string.c
index 7f8a069..a20d79c 100644
--- a/blink/string.c
+++ b/blink/string.c
@@ -274,7 +274,7 @@ static void RepStosbEnhanced(P) {
       diremain = 4096 - (diactual & 4095);
       diremain = MIN(diremain, 65536 - dilow);
       n = MIN(cx, diremain);
-      memset(direal, m->al, n);
+      if (!IsRomAddress(m, direal)) memset(direal, m->al, n);
       AddDi(A, n);
       cx = SubtractCx(A, n);
       if (cx) diactual = AddressDi(A);
diff --git a/config.h.in b/config.h.in
index c7a0ea6..5abf114 100644
--- a/config.h.in
+++ b/config.h.in
@@ -15,6 +15,7 @@
 // #define DISABLE_METAL
 // #define DISABLE_MMX
 // #define DISABLE_BCD
+// #define DISABLE_ROM
 // #define DISABLE_BMI2

 // #define HAVE_FORK
diff --git a/configure b/configure
index 8d91b77..0f3d365 100755
--- a/configure
+++ b/configure
@@ -104,6 +104,9 @@ if [ "$1" = "--help" ]; then
   echo "  --disable-bcd"
   echo "    disables i8086 binary coded decimal support (shaves ~1kb off MODE=tiny)"
   echo
+  echo "  --disable-rom"
+  echo "    disables write protection of metal bios area (shaves ~200 bytes off MODE=tiny)"
+  echo
   echo "  --disable-all"
   echo "    disable all optional features (shaves ~81kb off MODE=tiny)"
   echo "    you may use --enable-FOO to turn features back on"
@@ -412,6 +415,11 @@ for x; do
   elif [ x"$x" = x"--disable-bcd" ]; then
     uncomment "#define DISABLE_BCD"

+  elif [ x"$x" = x"--enable-rom" ]; then
+    comment "#define DISABLE_ROM"
+  elif [ x"$x" = x"--disable-rom" ]; then
+    uncomment "#define DISABLE_ROM"
+
   elif [ x"$x" = x"--posix" ]; then
     CPPFLAGS="-D_POSIX_C_SOURCE=200809L -D_XOPEN_SOURCE=700 -std=c11"
     POSIX=1

Anyone emulating FreeDOS with blink will just disable it, so is there really a point in implementing such a small feature now?

It turns out I was slightly mistaken. The buggy vinfo.com was only present in FreeDOS 1.3-rc1. The final version of FreeDOS 1.3 seems to have this vinfo.com issue already fixed.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

OK... here is the patch; I think I will just leave it here for now

Wow, that seems quite complicated for such a small feature. If such a feature were to be implemented, given it's complexity, it would be nice to add to it to allow for arbitrary byte ranges to be compared against read/write privileges, for instance to detect UMB and or XMS access for debugging, which could be more useful. I guess ultimately this comes down to what is the purpose of the emulator, to help with debugging, or to exactly emulate hardware? blink is quite a ways off from actually emulating hardware, as we don't even plan on emulating all the PC hardware devices.

The buggy vinfo.com was only present in FreeDOS 1.3-rc1. The final version of FreeDOS 1.3 seems to have this vinfo.com issue already fixed.

Well that fine, except that the FreeDOS 1.3-rc1 image you uploaded for me to debug GWBASIC for you still has it, and it appears there are many versions of FreeDOS out there, given that I've got three already and they all have different installs and CPU checks. Thus, I think we should comment out the Mprotect call in the current code so that it and other OSes can run rather than crash: this is more user-friendly for an emulator, as the modal dialog stops all execution and can't be overridden.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

Actually, I think it will in fact be better and more "user-friendly" for Blinkenlights to fail noisily, rather than silently try to do something bogus, if it cannot emulate a particular CPU feature. Such failures will be much easier for developers — e.g. us — to diagnose, when users do encounter and report them.

(But perhaps the failure message can be improved.)

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

Actually, I think it will in fact be better and more "user-friendly" for Blinkenlights to fail noisily, rather than silently try to do something bogus

Well, that means I can't boot your GWBASIC image without changing the blink source code, even though it works on a real PC, so it's not fully bogus. That was the whole reason I brought this up after this last merge. There should at least be an option for it, so that I don't have to comment it out in order to run that particular version of FreeDOS installer. (And/or please upload something that I can use for GWBASIC, as I would like to continue using GWBASIC to help you get it running well).

if it cannot emulate a particular CPU feature.

This isn't a CPU feature, its an IBM PC architectural feature, which in the real hardware case is ignored, and in our case, execution is stopped.

blink isn't limited to just IBM PC architecture - it should be able to (and can, I think) emulate embedded systems where there may or may not be a ROM BIOS hard-coded as there is now assumed to be. As such, I think the entire "feature" of both assuming a ROM at C0000-FFFFF as well as trapping writes to it (which are ignored anyways on real hardware) should not be turned on without either a command line option or ./configure option, and default to OFF (so that --disable-romcheck) does not have to be specified by default, and the command line option enables it).

Such failures will be much easier for developers — e.g. us — to diagnose

Agreed it could be a feature, when wanted. However, once found out, there needs to be a way to turn it off, rather than saying that blink won't run something that was distributed in the wild and runs.

(But perhaps the failure message can be improved.)

That's a definite for sure! A major problem continuing with the current design is that all segmentation faults go to the same handler, and no one (neither the user nor the developers) know whether this is a blink internal bug or an emulator "feature". This duality definitely needs to be sorted out, and IMO should not probably be used for trapping real mode non-paged memory access "errors"/issues.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

I think I get what you are getting at now — you would like some sort of run-time (rather than build-time) toggle to get Blinkenlights to flag attempts to write to the ROM area, as this may be useful for catching certain kinds of memory corruption bugs.

I (or someone else?) might try to implement this later. :slightly_smiling_face: I probably also need to decide though, if this should be settable from the command line, or from the TUI...

it would be nice to add to it to allow for arbitrary byte ranges to be compared against read/write privileges, for instance to detect UMB and or XMS access for debugging, which could be more useful.

I expect things such as UMB and EMS will be handled by the virtual memory paging mechanism (modern EMS is done using V86 mode). So — unless we are to emulate EMS in a different way — I expect the specific range checks will be orthogonal to checks on the physical memory addresses, the BIOS ROM range check being one of these. In any case, it might be interesting to think a bit about how to emulate EMS.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

I probably also need to decide though, if this should be settable from the command line, or from the TUI...

The ability to continue an emulation that has halted for a bad "check" result is very desirable - so that one can continue forward instead of being dead in the water, especially for checks that might differ based on an assumed platform. In general, once the problem "check" is understood, its nice for a way to continue through it without having to halt and have to manually continue, for those deeper debugging sessions. (As an example, merely testing GWBASIC requires a booting and running FreeDOS along with its install program, only to be aborted, to get a DOS prompt).

Along these debug issues, we can add to the list the need to emulate a ^C input to the emulated program. I already tried replacing, say ^E with ^C in the OnKeyboardServiceReadKeypress, but that doesn't work - I think the BIOS ends up calling the INT 23h Ctrl-Break routine for that case, which we don't emulate. I haven't tried this on ELKS yet; it doesn't use INT 23h. We should consider making ^E work as ^C in the emulated program.

The ability to set or remove a breakpoint (by inputted address) from the TUI rather than the command line would be nice.

Also, you may have noticed, that the disassembly pane will always show the current IP as the first address displayed when out has to recalculate... this ends up such that the user can't see the instruction stream behind the current instruction (which happens on every OnHalt/INT handler), which is quite inconvenient. I fixed this in blink16, to find when there is a symbol table present, the beginning of the current function, and to start from there if it's within 20-30 bytes of the current IP. This helps alot.

So, not sure the answer on TUI vs command line, but I say we need to make it as easy as possible for actual debugging or reverse engineering.

it might be interesting to think a bit about how to emulate EMS.

I thought that if blink just executes each instruction faithfully (including real, protected, long and unreal modes), along with any new BIOS interrupt handling functions, this will "just" work. (🙂).

Thank you!