mirage / mirage-tcpip

TCP/IP networking stack in pure OCaml, using the Mirage platform libraries. Includes IPv4/6, ICMP, and UDP/TCP support.
https://mirage.io
ISC License
341 stars 87 forks source link

DHCP page fault #80

Closed MagnusS closed 9 years ago

MagnusS commented 9 years ago

When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.

$ sudo xl create www.xl -c
Parsing config from www.xl
Xen Minimal OS!
  start_info: 0000000000322000(VA)
    nr_pages: 0x10000
  shared_inf: 0x40a67000(MA)
     pt_base: 0000000000325000(VA)
nr_pt_frames: 0x5
    mfn_list: 00000000002a2000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line:
       stack: 0000000000260800-0000000000280800
Mirage: start_kernel
MM: Init
      _text: 0000000000000000(VA)
     _etext: 0000000000151fde(VA)
   _erodata: 0000000000190000(VA)
     _edata: 0000000000247a10(VA)
stack start: 0000000000260800(VA)
       _end: 00000000002a12dc(VA)
  start_pfn: 32d
    max_pfn: 10000
Mapping memory range 0x400000 - 0x10000000
setting 0000000000000000-0000000000190000 readonly
skipped 1000
MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000)
MM: done
Demand map pfns at 10001000-0000002010001000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000010001000.
xencaml: app_main_thread
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
Unsupported function lseek called in Mini-OS kernel
getenv(OCAMLRUNPARAM) -> null
getenv(CAMLRUNPARAM) -> null
getenv(TMPDIR) -> null
getenv(TEMP) -> null
Netif: add resume hook
Netif.connect 0
Netfront.create: id=0 domid=0
MAC: c0:ff:ee:c0:ff:ee
Manager: connect
Attempt to open(/dev/urandom)!
Manager: configuring
DHCP: start discovery

Sending DHCP broadcast len 552
Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0
Page fault in pagetable walk (access to invalid memory?).
avsm commented 9 years ago

What's the output of opam list -i ?

On 10 Nov 2014, at 22:04, Magnus Skjegstad notifications@github.com wrote:

When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.

$ sudo xl create www.xl -c Parsing config from www.xl Xen Minimal OS! start_info: 0000000000322000(VA) nr_pages: 0x10000 shared_inf: 0x40a67000(MA) pt_base: 0000000000325000(VA) nr_pt_frames: 0x5 mfn_list: 00000000002a2000(VA) mod_start: 0x0(VA) mod_len: 0 flags: 0x0 cmd_line: stack: 0000000000260800-0000000000280800 Mirage: start_kernel MM: Init _text: 0000000000000000(VA) _etext: 0000000000151fde(VA) _erodata: 0000000000190000(VA) _edata: 0000000000247a10(VA) stack start: 0000000000260800(VA) _end: 00000000002a12dc(VA) start_pfn: 32d max_pfn: 10000 Mapping memory range 0x400000 - 0x10000000 setting 0000000000000000-0000000000190000 readonly skipped 1000 MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000) MM: done Demand map pfns at 10001000-0000002010001000. Initialising timer interface Initialising console ... done. gnttab_table mapped at 0000000010001000. xencaml: app_main_thread getenv(OCAMLRUNPARAM) -> null getenv(CAMLRUNPARAM) -> null Unsupported function lseek called in Mini-OS kernel Unsupported function lseek called in Mini-OS kernel Unsupported function lseek called in Mini-OS kernel getenv(OCAMLRUNPARAM) -> null getenv(CAMLRUNPARAM) -> null getenv(TMPDIR) -> null getenv(TEMP) -> null Netif: add resume hook Netif.connect 0 Netfront.create: id=0 domid=0 MAC: c0:ff:ee:c0:ff:ee Manager: connect Attempt to open(/dev/urandom)! Manager: configuring DHCP: start discovery

Sending DHCP broadcast len 552 Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0 Page fault in pagetable walk (access to invalid memory?). — Reply to this email directly or view it on GitHub https://github.com/mirage/mirage-tcpip/issues/80.

MagnusS commented 9 years ago
$ opam list -i
# Installed packages for system:
base-bigarray             base  Bigarray library distributed with the OCaml compiler
base-bytes              legacy  Bytes compatibility library distributed with ocamlfind
base-no-ppx               base  A pseudo-library to indicate lack of extension points support
base-threads              base  Threads library distributed with the OCaml compiler
base-unix                 base  Unix library distributed with the OCaml compiler
base64                   1.0.0  Base64 encoding and decoding library
camlp4                  4.01.0  Camlp4 is a system for writing extensible parsers for programming languages
cmdliner                 0.9.5  Declarative definition of command line interfaces for OCaml
cohttp                  0.12.0  HTTP library for Lwt, Async and Mirage
conduit                  0.6.1  Network connection library for TCP and SSL
conf-pkg-config            1.0  Virtual package relying on pkg-config installation.
crunch                   1.3.0  Convert a filesystem into a static OCaml module
cstruct                  1.4.0  access C structures via a camlp4 extension
dns                     0.11.0  DNS client and server implementation
fieldslib            109.20.03  Syntax extension to define first class values representing record fields, to get and set record fields, iterate and fold over
io-page                  1.1.1  Allocate memory pages suitable for aligned I/O
ipaddr                   2.5.0  IP (and MAC) address representation library
lwt                      2.4.6  A cooperative threads library for OCaml
mirage                   2.0.0  The Mirage library operating system
mirage-clock-unix        1.0.0  A Mirage-compatible Clock library for Unix
mirage-clock-xen         1.0.0  A Mirage-compatible Clock library for Xen
mirage-conduit           2.0.0  Virtual package for the Mirage Conduit transports
mirage-console           2.0.0  A Mirage-compatible Console library for Xen and Unix
mirage-dns               2.0.0  Virtual package for the Mirage DNS transports
mirage-http              2.0.0  Mirage HTTP client and server driver for Unix
mirage-net-unix          1.1.1  Ethernet network driver for Mirage, using tuntap
mirage-net-xen           1.1.3  Ethernet network device driver for Mirage/Xen
mirage-types             2.0.0  Module type definitions for Mirage-compatible applications
mirage-types-lwt         2.0.0  Lwt module type definitions for Mirage-compatible applications
mirage-unix              2.0.0  Mirage OS library for Unix compilation
mirage-xen               2.0.0  Mirage OS library for Xen compilation
mirage-xen-minios        0.4.1  Xen MiniOS guest operating system library
oasis                    0.4.5  Architecture for building OCaml libraries and applications
ocaml-data-notation     0.0.11  Store data using OCaml notation
ocamlfind                1.5.5  A library manager for OCaml
ocamlify                 0.0.1  Include files in OCaml code
ocamlmod                 0.0.7  Generate OCaml modules from source files
ocplib-endian              0.7  Optimised functions to read and write int16/32/64 from strings and bigarrays, based on new primitives added in version 4.01.
optcomp                    1.6  Optional compilation with cpp-like directives
ounit                    2.0.0  Unit testing framework loosely based on HUnit. It is similar to JUnit, and other XUnit testing frameworks
re                       1.2.2  RE is a regular expression library for OCaml
sexplib              111.13.00  Library for serializing OCaml values to and from S-expressions
shared-memory-ring       1.1.0  Shared memory rings for RPC and bytestream communications.
ssl                      0.4.7  Bindings for OpenSSL
stringext                1.0.0  Extra string functions for OCaml
tcpip                    2.0.1  Userlevel TCP/IP stack
tuntap                   1.0.0  TUN/TAP bindings
type_conv            111.13.00  Library for building type-driven syntax extensions
uri                      1.7.2  RFC3986 URI/URL parsing library
vchan                    2.0.0  Xen Vchan implementation
xen-evtchn               1.0.5  Xen event channel bindings.
xen-gnt                  2.0.0  Xen grant table bindings
xenstore                 1.2.5  Xenstore protocol clients and server
xenstore_transport       0.9.4  Low-level libraries for connecting to a xenstore service on a xen host.
avsm commented 9 years ago

could you run 'gdb ' and 'dis 151b17' to find out where it faulted (thats the RIP instruction pointer)

On 10 Nov 2014, at 22:04, Magnus Skjegstad notifications@github.com wrote:

When I run the static-website or stackv4 examples from mirage-skeleton under Xen they page fault with DHCP. Static IP seems to work fine. I use Xen 4.4 in Virtualbox (with Ubuntu 14.10 Server) and Mirage 2.0 from the main opam repo.

$ sudo xl create www.xl -c Parsing config from www.xl Xen Minimal OS !

start_info: 0000000000322000(VA) nr_pages: 0x10000 shared_inf: 0x40a67000(MA) pt_base: 0000000000325000(VA) nr_pt_frames: 0x5 mfn_list: 00000000002a2000(VA) mod_start: 0x0(VA) mod_len: 0 flags: 0x0 cmd_line: stack: 0000000000260800-0000000000280800 Mirage: start_kernel MM: Init _text: 0000000000000000(VA) _etext: 0000000000151fde(VA) _erodata: 0000000000190000(VA) _edata: 0000000000247a10(VA) stack start: 0000000000260800(VA) _end: 00000000002a12dc(VA) start_pfn: 32d max_pfn: 10000 Mapping memory range 0x400000 - 0x10000000 setting 0000000000000000-0000000000190000 readonly

skipped 1000 MM: Initialise page allocator for 3ab000(3ab000)-10000000(10000000 ) MM: done

Demand map pfns at 10001000-0000002010001000. Initialising timer interface Initialising console ... done . gnttab_table mapped at 0000000010001000. xencaml: app_main_thread getenv(OCAMLRUNPARAM) -

null getenv(CAMLRUNPARAM) -

null Unsupported function lseek called in Mini-OS kernel Unsupported function lseek called in Mini-OS kernel Unsupported function lseek called in Mini-OS kernel getenv(OCAMLRUNPARAM) -

null getenv(CAMLRUNPARAM) -

null getenv(TMPDIR) -

null getenv(TEMP) -

null Netif: add resume hook Netif.connect 0 Netfront.create: id=0 domid=0 MAC: c0:ff:ee:c0:ff:ee Manager: connect Attempt to open(/dev/urandom) !

Manager: configuring DHCP: start discovery

Sending DHCP broadcast len 552 Page fault at linear address 28, rip 151b17, regs 000000000027fc48, sp 27fcf0, our_sp 000000000027fc10, code 0 Page fault in pagetable walk (access to invalid memory?). — Reply to this email directly or view it on GitHub.

MagnusS commented 9 years ago

disas says 0x151b17 is in memmove

avsm commented 9 years ago

time for some printf debugging to narrow down where the fault is occurring...probably in the dhcp code in mirage-tcpip

MagnusS commented 9 years ago

After doing some more testing it turns out that static IP doesn't work either. Interestingly, the static IP kernel only seems to crash after it has received (or tried to reply to) two IP packets. It crashes with TCP SYNs on closed and open ports and with ICMP packets. ARP seems to work fine.

gdb disas reports that the page faults are in caml_tcpip_ones_complement (ICMP) and caml_tcpip_ones_complement_list (TCP SYN).

If I edit lib/tcpip_checksums.ml to use caml_ones_complement and caml_ones_complement_list (not camltcpip*) that fixes the problem.

If I replace mirage-tcpip/lib/checksums_stubs.c with mirage-platform/xen/runtime/xencaml/checksum_stubs.c and rename the C functions to camltcpip* the kernel still crashes.

avsm commented 9 years ago

It would be good to take this binary image and run it on a real Xen box to determine if it's a vbox specific problem or not.

On 12 Nov 2014, at 10:25, Magnus Skjegstad notifications@github.com wrote:

After doing some more testing it turns out that static IP doesn't work either. Interestingly, the static IP kernel only seems to crash after it has received (or tried to reply to) two IP packets. It crashes with TCP SYNs on closed and open ports and with ICMP packets. ARP seems to work fine.

gdb disas reports that the page faults are in caml_tcpip_ones_complement (ICMP) and caml_tcpip_ones_complement_list (TCP SYN).

If I edit lib/tcpip_checksums.ml to use caml_ones_complement and caml_ones_complement_list (not camltcpip*) that fixes the problem.

If I replace mirage-tcpip/lib/checksums_stubs.c with mirage-platform/xen/runtime/xencaml/checksum_stubs.c and rename the C functions to camltcpip* the kernel still crashes.

— Reply to this email directly or view it on GitHub https://github.com/mirage/mirage-tcpip/issues/80#issuecomment-62697772.

talex5 commented 9 years ago

@MagnusS what's the difference in the disassembly of the two versions of ones_complement?

MagnusS commented 9 years ago

I don't have access to a real Xen server at the moment, but I installed the older Ubuntu 14.04 w/Xen in vbox and ran the same examples. The gcc in 14.04 is older - 4.8 vs 4.9 in 14.10. The unikernels compiled in Ubuntu 14.04 works without page fault in both 14.04 and 14.10.

As caml_ones_complement_checksum (which works) is from libxencaml.a and caml_tcpip_ones_complement_checksum (which doesn't work) is from libtcpip_stubs.a, I checked if there were differences in how the libraries were compiled. The only flag used to compile checksum_stubs.c in libtctip_stubs.a is -O2. The flags used for libxencaml.a are (without -D/U/I/W etc) -O3 -mno-red-zone -fno-tree-loop-distribute-patterns -fno-stack-protector -fno-reorder-blocks -fstrict-aliasing -m64 -fno-asynchronous-unwind-tables -momit-leaf-frame-pointer -mfancy-math-387.

I compiled libtcpip_stubs.a with the flags above in Ubuntu 14.10 and the DHCP and static IP versions of static_website now seem to work without page fault.

talex5 commented 9 years ago

I guess the -mno-red-zone is the most likely cause (I'm not sure how Mini-OS on x86 handles the stack).

talex5 commented 9 years ago

@avsm what prevents normal OCaml code from assuming a red zone? Do we just hope that ocamlopt doesn't do that?

avsm commented 9 years ago

Yes, we absolutely must compile with no red zone on MiniOS/x86_64, since it doesn't work when the whole application is running in a privileged ring.

On 17 Nov 2014, at 11:35, Thomas Leonard notifications@github.com wrote:

I guess the -mno-red-zone is the most likely cause (I'm not sure how Mini-OS on x86 handles the stack).

— Reply to this email directly or view it on GitHub https://github.com/mirage/mirage-tcpip/issues/80#issuecomment-63292544.

MagnusS commented 9 years ago

I can confirm that the page fault is fixed in 14.10 with -mno-red-zone and -fno-stack-protector. Ubuntu patches gcc to enable stack protector by default: https://wiki.ubuntu.com/Security/Features