rgeo / rgeo-proj4

Proj4 extension for rgeo.
MIT License
13 stars 14 forks source link

fix(ext): segv on fork #40

Closed BuonOmo closed 1 year ago

BuonOmo commented 1 year ago

We would segv when using proj and then fork a process. This is fixed by using proj's thread-safe context.

BuonOmo commented 1 year ago

@keithdoggett no the init function is only called on require. TBH I don't know why it works now since the context stays the same after fork.

We could wrap Process.fork to change the context (as done by ko1/nakayoshi_fork) but I don't see the need as it works as is.

Yet, if you can make more tests (and different than the ones I did I guess) I think it would be great. As I'm not really an expert on that topic.

Also see if the test I added reproduces the segv on your machine as well would be nice

Reproduction attempt without ruby

I naively tried something like this:

#include <proj.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int _argc, char const *_argv[])
{
    PJ_CONTEXT *ctx;
    //*
    ctx=proj_context_create();
    /*/
    ctx=PJ_DEFAULT_CTX;
    //*/
    proj_create(ctx, "EPSG:4055");
    for (int i = 0; i < 10; ++i) {
        int child_pid = fork();
        if (child_pid) {
            proj_create(ctx, "EPSG:4055");
            waitpid(child_pid, NULL, 0);
        } else {
            proj_create(ctx, "EPSG:4055");
        }
    }
    return 0;
}

Not failing.

keithdoggett commented 1 year ago

Sorry about the delay getting back on this. I can confirm that this fixed a crash happening on my side. I can try to look into it more and figure out why this might be working. @Pe-co can you let us know if this fixes the issues on your end?

Pe-co commented 1 year ago

I updated my local error reproduction project (the one I linked in the issue) to use the fix/fork-segv branch and i still get a seg fault. Let me know if I need to to anything else to verify the fix. I do get a different callstack, but if I read it correctly the error is still from the same line.

.../gems/ruby-3.1.2/bundler/gems/rgeo-proj4-356624a8166f/lib/rgeo/coord_sys/proj4.rb:249: [BUG] Segmentation fault at 0x0000000108d68ade

BuonOmo commented 1 year ago

@Pe-co this means either I fixed another mistake and didn't find yours, or it is os dependent. Would you have time to also test that my branch is fixing something ?

t=$(mktemp -d)
git clone git@github.com:rgeo/rgeo-proj4 $t
cd $t
git checkout fix/fork-segv

for _ in a b c d e; do
  rake test
done
# verify that none failed by looking at the tests
# (look for dreadful long backtrace, or segv message)

git checkout master
git checkout fix/fork-segv -- test

for _ in a b c d e; do
  rake test
done
# verify that none failed by looking at the tests

cd - || cd
rm -rf $f
Pe-co commented 1 year ago

When i run the tests for fix/fork-segv or a clean master i get no seg fault. When i run them for 'master with git checkout fix/fork-segv -- test ' they hang. Running one or several tests suites makes no difference

BuonOmo commented 1 year ago

@Pe-co so it seems like we are fixing something, but not your thing! I say that we already merge that one and then we'll have one less bug to look at when trying to debug your issue :slightly-smiling-face:

Pe-co commented 1 year ago

Ok, so we expected an error on the master/test merge, but you got a seg fault an I just got a hang, makes sense.

BuonOmo commented 1 year ago

@Pe-co I get a hang sometimes as well FYI (c.f. test description).

@keithdoggett ready to merge when you are (last push is just an update to the commit message)

keithdoggett commented 1 year ago

@BuonOmo as we discussed we're going to merge this since it's a step in the direction resolve this and was able to fix the issue in some instances. We talked about next steps for debugging the still pending issues and should have updates soon.

@Pe-co we will loop you in with any relevant updates as well.