Fatal error: exception Failure("Marshal.data_size: bad object")

tcoopman commented 5 years ago

Sometimes at the end of a run (I guess after an exit) this message appears on the console: Fatal error: exception Failure("Marshal.data_size: bad object").

This doesn't happen every time.

OS: Linux/Archlinux.

Schniz commented 5 years ago

It happens to me too, but without any "exit", just by making a process send a message to another process:

open Reactor.System;

Reactor.Node.(Policy.default() |> setup);

let printer = spawn((ctx, _) => `Become(0), 0);

let _ =
  spawn(
    (ctx, state) => {
      printer <- "hello";
      `Become(state);
    },
    0,
  );

Reactor.Node.run();

Schniz commented 5 years ago

It also looks like there is some kind of race condition here. I'm just running the following code:

Fmt_tty.setup_std_outputs();
Logs.set_level(Some(Logs.Debug));
Logs.set_reporter(Logs_fmt.reporter());

open Reactor.System;

Reactor.Node.(Policy.default() |> setup);

let _ = spawn((ctx, state) => exit(), 0);

Reactor.Node.run();

There are times it exits successfully, and the tail looks like:

TestFrameworkApp.exe: [DEBUG] [26298] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26299] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26297] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26295] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26298] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26299] Handling tasks...
TestFrameworkApp.exe: [DEBUG] [26295] Handling tasks...
TestFrameworkApp.exe: [DEBUG] [26297] Handling tasks...
TestFrameworkApp.exe: [DEBUG] [26298] Handling tasks...
TestFrameworkApp.exe: [INFO] [26291] Node shutting down...

However, when it fails, it looks something like:

TestFrameworkApp.exe: [DEBUG] [26198] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26199] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26198] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26199] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26198] Handling tasks...
TestFrameworkApp.exe: [DEBUG] [26192] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26199] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26192] Receiving tasks...
TestFrameworkApp.exe: [DEBUG] [26199] Handling tasks...
TestFrameworkApp.exe: [DEBUG] [26192] Handling tasks...
TestFrameworkApp.exe: [INFO] [26199] Scheduler shutting down...
TestFrameworkApp.exe: [INFO] [26189] Node shutting down...
TestFrameworkApp.exe: [INFO] [26201] Beginning scheduler loop...
TestFrameworkApp.exe: [DEBUG] [26201] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26201] Receiving tasks...
TestFrameworkApp.exe: [ERROR] [26201] Uncaught exception in scheduler: Failure("Marshal.data_size: bad object")
TestFrameworkApp.exe: [INFO] [26200] Beginning scheduler loop...
TestFrameworkApp.exe: [DEBUG] [26200] Tasks queue has 0 tasks
TestFrameworkApp.exe: [DEBUG] [26200] Receiving tasks...
TestFrameworkApp.exe: [ERROR] [26200] Uncaught exception in scheduler: Failure("Marshal.data_size: bad object")

So it looks like after getting the Node shutting down the scheduler keeps listening for messages; maybe this is the problem?

Schniz commented 5 years ago

I guess that sending a sigkill doesn't wait for the result (which is killing). Maybe we should wait for that?

leostera commented 5 years ago

@Schniz good idea. I believe the bug may be related to the pipe used across processes being removed after one of them die, and then the other one can't really read a full command from it and blow up.

I'm looking into fixes in this PR: https://github.com/ostera/reactor/pull/17

leostera commented 5 years ago

@tcoopman @Schniz could you help me out testing the current master? I merged #17 and it's looking good on my end but I want to verify if this issue persists.

leostera / reactor

Fatal error: exception Failure("Marshal.data_size: bad object") #16