whitfin / local-cluster

Easy local cluster creation for Elixir to aid in unit testing
MIT License
223 stars 30 forks source link

Allow `Node.spawn/2` to work out of the box. #8

Closed derekkraan closed 5 years ago

derekkraan commented 5 years ago

Hi,

I was playing with your library this morning, but I can't seem to get it to work. I've added the smallest test case that I could think of that didn't do what I was expecting.

Could you take a look? Am I doing something wrong?

One possibility is that I'm testing this in a library that is not an "application". (this library itself is also not an application).

Thanks Derek

keathley commented 5 years ago

Can you try this with the MFA version of Node.spawn? I think the issue is that when you evaluate the function on another node it can’t reference the pid value in the function. I’ve seen issues like this in the past but I can’t remember if it was this exact situation.

derekkraan commented 5 years ago

It works with the MFA version :thinking:

keathley commented 5 years ago

I think node spawn with the anonymous function doesn’t carry the function context from the node that “created” the function. I never dug enough to know if it’s intentional or not but I could kinda understand why you’d have it work that way on purpose.

derekkraan commented 5 years ago

It does work with a regular cluster though.

image

keathley commented 5 years ago

Oh! Well that's very interesting.

keathley commented 5 years ago

Ok, I think I've figured out a few things. I think it has to do with the fact that we're defining the anonymous function inside the test module. The test module isn't on the other boxes because the test module is just part of an elixir script that gets run. If you create the function inside of another module then everything just works the way you would expect.

defmodule LocalCluster.Stuff do
  def spawn(n, pid) do
    f = fn -> send(pid, :hello_cluster) end
    Node.spawn(n, f)
  end
end

  test "spawns on a child node" do
    pid = self()

    [node] = LocalCluster.start_nodes(:spawn_child, 1)

    # Doing this doesn't work
    # f = fn -> send(pid, :hello_cluster) end
    # Node.spawn(node, f)

  # This does work
    LocalCluster.Stuff.spawn(node, pid)

    assert_receive :hello_cluster
  end

This would also explain why the MFA version works correctly.

keathley commented 5 years ago

This also works which I think better explains the issue:

defmodule LocalCluster.Stuff do
  def f(pid, msg) do
    fn -> send(pid, msg) end
  end
end

  test "spawns on a child node" do
    pid = self()

    [node] = LocalCluster.start_nodes(:spawn_child, 1)

    f = LocalCluster.Stuff.f(pid, :hello_cluster)
    Node.spawn(node, f)

    assert_receive :hello_cluster
  end
derekkraan commented 5 years ago

wow, mind = blown. I always thought an anonymous function was just a function body + some bindings. Didn't realize that the module it is defined in is also relevant in any way.

Ok, this makes a lot of sense. Thanks for looking into it! I'm getting closer to being able to test something real now :)

derekkraan commented 5 years ago

I'm back! I was really missing this functionality, so I looked into what would make it possible.

There are some potential drawbacks here:

  1. We are compiling the test module each time we run LocalCluster.start_nodes/2.
  2. It requires the use of a (small) macro, so now you have to require LocalCluster in your test files.

I'm open to suggestions on how to improve this PR so that it can be merged.

Cheers Derek

derekkraan commented 5 years ago

I have noticed tests failing frequently with :not_alive. I don't know what this is but I don't think it's specific to this branch.

derekkraan commented 5 years ago

Hey guys, any feedback on how I can push this PR forward?

Thanks, Derek

derekkraan commented 5 years ago

Hey @whitfin,

Just checking in again. I would really love to use this library to make some better tests for Horde and DeltaCrdt, and this PR would greatly smooth the path towards doing that.

Any chance I could get an extra set of eyes on this here PR and potentially get it merged?

Thanks, Derek

whitfin commented 5 years ago

@derekkraan so I took a look and I guess I don't understand why this needs to be built-in, since you're just including your file on the spawned nodes (which to be fair, I thought was already happening).

What's the reason for inclusion? I'm not totally against it, I guess I just want to make sure it's necessary. Typically I'd prefer a macro-less approach, so that start_nodes stays the same.

Edit: feel free to reach out to me on the Elixir Slack too. I don't know why I missed the comments on this PR up until this morning!

whitfin commented 5 years ago

This will be released in v1.1.0 via the load/2 function, as discussed in Slack. Changes are in https://github.com/whitfin/local-cluster/commit/2f1853daec447ddf9c0480eb2b56f3e5cc673a97. Once I figure out what is wrong with CI, I'll publish this morning!

Thanks for the suggestion!

pmenhart commented 4 years ago

Solution in 2f1853daec447ddf9c0480eb2b56f3e5cc673a97 is failing tests in our umbrella app. Instead of Code.compile_string/2 on the master node, I am suggesting an alternative implementation based on Code.require_file/2 running on each slave node - see PR #16. @derekkraan I'd appreciate your feedback: do you see any obvious drawback for your use case?

derekkraan commented 4 years ago

@pmenhart I tested your branch with my test code, seems to work fine.