igraph / rigraph

igraph R package
https://r.igraph.org
543 stars 200 forks source link

Interrupting calculations crashes R #505

Closed szhorvat closed 2 years ago

szhorvat commented 2 years ago

Describe the bug

Interrupting any igraph calculation crashes R.

To reproduce

Run betweenness(make_ring(40000)), or any other calculation that takes a long time, then in a few seconds interrupt it.

> betweenness(make_ring(40000))
^C
R(83467,0x1077f05c0) malloc: *** error for object 0x7ffeea238170: pointer being freed was not allocated
R(83467,0x1077f05c0) malloc: *** set a breakpoint in malloc_error_break to debug
fish: Job 1, 'R' terminated by signal SIGABRT (Abort)

Version information

Current 1.3 dev on macOS, specifically 1.2.10.9118

ntamas commented 2 years ago

This seems to fix the issue for me, at least when invoking R from the command line:

diff --git a/tools/stimulus/rinterface.c.in b/tools/stimulus/rinterface.c.in
index c17777be..aad93479 100644
--- a/tools/stimulus/rinterface.c.in
+++ b/tools/stimulus/rinterface.c.in
@@ -2324,7 +2324,7 @@ void R_igraph_warning_handler(const char *reason, const char *file,
 extern int R_interrupts_pending;

 int R_igraph_interrupt_handler(void *data) {
-#if  ( defined(HAVE_AQUA) || defined(Win32) )
+#if  ( defined(Win32) )
   R_CheckUserInterrupt();
 #else
   if (R_interrupts_pending) {

I don't have an R GUI so I cannot test whether it also works for a GUI or not. Note that Windows still uses a different code path.

clpippel commented 2 years ago

In windows (8-Core x64-Processor, ram16.0 GB), R does not interrupt:

---> system.time(make_full_citation_graph(n=40000)) Error in make_full_citation_graph(n = 40000) : At type_indexededgelist.c:306 : cannot add edges, Out of memory Timing stopped at: 74.46 90.02 296.7

---> system.time(make_full_citation_graph(n=50000)) crashes R in a few seconds.

szhorvat commented 2 years ago

@vtraag Do you have the possibility to test this on Windows? That's the missing piece now.

There is no problem with the macOS GUI.

ntamas commented 2 years ago

Note to self: R_CheckUserInterrupt() checks whether the user has pressed ^C recently and if so, it longjmps back to the top level in the usual evil longjmpish way :) So, by looking at the code above, it seems like there is no proper resource cleanup on Windows (i.e. when #if (defined(Win32)) is true) because we never call IGRAPH_FINALLY_FREE().

Related reading: https://www.tidyverse.org/blog/2019/05/resource-cleanup-in-c-and-the-r-api/

I don't know why Win32 and macOS were special-cased in the interrupt handler; maybe this was needed only in older R versions but not any more. @vtraag if you are willing to give it a go on a Windows machine (when you are back from holidays, not now :) ), let me know and we can then test it both with and without the #if (defined(Win32)) branch.

szhorvat commented 2 years ago

@vtraag Would it be difficult for you to test this on Windows? We should not ship a version with broken interruptions.

szhorvat commented 2 years ago

OK, I managed to compile this on Windows, and I can confirm that it does crash, both with the GUI and the command line versions.

Testing this is not easy for me, as I need to access the Windows computer remotely and compiling the package takes a very long time. BTW, if this is useful: I needed to install both libxml2 and glpk, for both i686 and x86_64 architectures, before compilation worked out of the box. I used remotes::install_github("igraph/rigraph@master").

szhorvat commented 2 years ago

When interruptions are suspended, we get a crash on macOS as well, due to an inappropriate IGRAPH_FINALLY_FREE() call.

> suspendInterrupts(betweenness(make_ring(10000)))
Error in betweenness(make_ring(10000)) : 
  At core/core/dqueue.pmt:285 : Assertion failed: q->stor_begin != 0. This is an unexpected igraph error; please report this as a bug, along with the steps to reproduce it.

I tried to fix this in 7d08c9364f9fb2bddab56e0bd9ebba28f206058d

szhorvat commented 2 years ago

This may be useful, but may have a significant performance cost: https://stackoverflow.com/a/40565675/695132

szhorvat commented 2 years ago

Looking at the R_CheckUserInterrupt() sources, Windows is special cased in version 3.4, but not in 3.5 and later.

https://github.com/wch/r-source/blob/tags/R-3-4-0/src/main/errors.c#L114 https://github.com/wch/r-source/blob/tags/R-3-5-0/src/main/errors.c#L114

What is the oldest version we need to support?

szhorvat commented 2 years ago

With the current changes I made, interruption does not work at all on Windows (GUI or command line). Also, any computation locks up the GUI. This is likely due to R_ProcessEvents(); not being run. I will try to add it now. See da2fe3ea1fa88d9bf138620776271c191616ce4e

ntamas commented 2 years ago

What is the oldest version we need to support?

CRAN provides binaries for the current R release (4.1.3), the R development version (4.2.0) and the previous R minor release (4.0.x). I think these are the ones we need to support. CI also tests on 3.5 and 3.6; they seem to pass now but I think it's okay to let them go if needed.

vtraag commented 2 years ago

FYI, with commit https://github.com/igraph/rigraph/commit/97b8b7443762aa94ee18cb00e273f7ac582a5992, I can safely interrupt the calculations in R on Windows. That is, it both works and it doesn't crash.

szhorvat commented 2 years ago

For me, it does not work. It still crashes.

vtraag commented 2 years ago

On Windows?

szhorvat commented 2 years ago

Yes. Also, I don't see what the commit changed in the logic. It's basically the same as before. It does not crash every time, but it crashes often.

szhorvat commented 2 years ago

When requesting debugging, the error I get is this:

"Unhandled exception at 0x000000007748F262 (ntdll.dll) in Rterm.exe: 0xC0000374: A heap has been corrupted (parameters: 0x00000000774F7C70)."

The code now looks like this:

  // 1
  if (!R_interrupts_suspended && R_interrupts_pending) {
    // 2
    IGRAPH_FINALLY_FREE();
    // 3
  }
  R_CheckUserInterrupt();

I put some print statements (fprintf(stderr, "something\n"); fflush(stdderr);) at locations 1, 2 and 3. I see the output from 1 but never from 2 or 3. I don't understand how this is possible.

szhorvat commented 2 years ago

Tried this, still crashes:

void checkInterruptFn(void *dummy) {
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (! R_ToplevelExec(checkInterruptFn, NULL))
    return IGRAPH_INTERRUPTED;
  return IGRAPH_SUCCESS;
}
ntamas commented 2 years ago

@szhorvat Are you compiling the module in debug mode with optimizations off? (I.e. somerthing like -O0 or its equivalent in MSVC). Otherwise the stack trace could be misleading as the compiler can randomly rearrange stuff.

vtraag commented 2 years ago

I now ran betweenness(make_ring(40000)) a few dozen times or so and interruption it never crashed. I installed igraph using remotes::install_github("igraph/rigraph@master") and I am working on R 4.0.3 GUI (64-bit).

szhorvat commented 2 years ago

@vtraag Can you try 4.1.3? Are you on Windows 10? I was using Windows 7.

vtraag commented 2 years ago

Running it from the terminal also doesn't crash.

vtraag commented 2 years ago

@vtraag Can you try 4.1.3? Are you on Windows 10? I was using Windows 7.

Yes, I'll give it a go. I am on Windows 10.

vtraag commented 2 years ago

I can confirm the problem indeed on R 4.1.3.

szhorvat commented 2 years ago

If I try closeness(make_ring(10000)), it does not crash upon interruption. Also, I can't see relevant changes between 4.0 and 4.1. The problem was likely already there in 4.0, it just did not have symptoms.

vtraag commented 2 years ago

It seems rather difficult to get any debug symbols, both using gdb and MSVC it doesn't seem to be able to find any debug symbols. I added the -g flag, for some reason -O2 is still being used though, but I'm not sure where that flag is coming from.

vtraag commented 2 years ago

Following the instructions from the R for Windows FAQ (section 8.4), I was finally able to get some debug symbols for igraph.

I have now installed the code from the master branch using Rcmd.exe INSTALL --debug ., after which I attached the gdb debugger from the RTools installation (installed using pacman -S mingw-w64-x86_64-gdb) to a running R process. When then running betweenness(make_ring(10000)) and hitting Ctrl-C, it jumps to gdb. Then sending the signal to R again signal SIGINT, I finally get the following traceback:

(gdb) bt
#0  0x00007fffc6ecf1d3 in ntdll!RtlIsZeroMemory ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007fffc6ed7f92 in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007fffc6ed827a in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007fffc6eddf01 in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#4  0x00007fffc6df5bf0 in ntdll!RtlGetCurrentServiceSessionId ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#5  0x00007fffc6df47b1 in ntdll!RtlFreeHeap ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#6  0x00007fffc6509c9c in msvcrt!free () from C:\WINDOWS\System32\msvcrt.dll
#7  0x00000000655ac88a in igraph_vector_destroy (v=0x441d2a0)
    at core/core/vector.pmt:378
#8  0x0000000065591d37 in IGRAPH_FINALLY_FREE () at core/core/error.c:261
#9  0x00000000656e586d in R_igraph_finalizer () at rinterface.c:2475
#10 0x000000006c7a7bc3 in Rf_NewFrameConfirm ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#11 0x000000006c7a884d in Rf_NewFrameConfirm ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#12 0x000000006c7fd218 in Rf_eval () from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#13 0x000000006c795296 in Rf_endcontext ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#14 0x000000006c7fe91a in R_cmpfun1 ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#15 0x000000006c7ffbca in Rf_applyClosure ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#16 0x000000006c7fcdfc in Rf_eval () from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#17 0x000000006c8272f1 in Rf_ReplIteration ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#18 0x000000006c827661 in Rf_ReplIteration ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#19 0x000000006c8276fd in run_Rmainloop ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#20 0x000000006c82779e in Rf_mainloop ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#21 0x000000000040171a in ?? ()
#22 0x000000000040158a in ?? ()
#23 0x00000000004013c5 in ?? ()
#24 0x000000000040152b in ?? ()
#25 0x00007fffc4f67034 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\System32\kernel32.dll
#26 0x00007fffc6e22651 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#27 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
vtraag commented 2 years ago

Could it be that R_interrupts_pending is still 0 but turns 1 when R_CheckUserInterrupt is called? I'm not sure if there is some separate thread that is setting R_interrupts_pending to 1 or something?

szhorvat commented 2 years ago

I think that is indeed possible. See this comment:

https://github.com/wch/r-source/blob/trunk/src/main/errors.c#L145

szhorvat commented 2 years ago

Do we actually need IGRAPH_FINALLY_FREE() in the finalizer? If I understand this right, normally the finally stack should already be empty at that point. If the goal is to keep it empty, perhaps it can be replaced with IGRAPH_FINALLY_CLEAN(IGRAPH_FINALLY_STACK_SIZE()). Then if things so wrong, we have a memory leak instead of a crash.

Once error handling is cleaned up, the R error handler is made to return, and errors are handled predictably in the stimulus-generated wrapper functions, this should go away?

ntamas commented 2 years ago

Indeed -- I've found some more info here (search for R_ProcessEvents()). So what probably happens is that on Linux or on macOS, from the CLI, pressing Ctrl-C calls the standard SIGINT signal handler, which in turn sets R_interrupts_pending to 1 and we notice it in the next iteration. However, in the GUI, R_interrupts_pending is set by R_ProcessEvents(), which means that we won't notice R_interrupts_pending turning 1 unless we call R_ProcessEvents() ourselves. Depending on how R_ProcessEvents() is organized (i.e. are GUI events handled on the same thread as the thread that is running R code?), this might or might not mean that we won't call IGRAPH_FINALLY_FREE(), thinking that an interrupt has not happened, while in reality it did happen.

szhorvat commented 2 years ago

I think that is indeed possible. See this comment:

https://github.com/wch/r-source/blob/trunk/src/main/errors.c#L145

What I was thinking here was that R_interrupts_pending does not get set until R_ProcessEvents() is run. But we did in fact try several variations where it is run. See this comment as well as the previous commit. Both of them run R_ProcessEvents() before checking for an interrupt, although the first one lets the error handler take care of the interrupt.

szhorvat commented 2 years ago

OK, my last experiment just finished. I had this code:

void checkInterruptFn(void *x) {
  (void)x;
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (! R_ToplevelExec(checkInterruptFn, NULL)) {
    IGRAPH_FINALLY_CLEAN(IGRAPH_FINALLY_STACK_SIZE());
    return IGRAPH_INTERRUPTED;
  }
  return IGRAPH_SUCCESS;
}

In addition, I commented out the IGRAPH_FINALLY_FREE() in the finalizer.

Yet what happens is this:

> betweenness(make_ring(10000))

Error in betweenness(make_ring(10000)) :
  At core/core/error.c:252 : Corrupt finally stack: trying to pop 1 element(s) when only 0 left. This is an unexpected i
graph error; please report this as a bug, along with the steps to reproduce it.

I don't understand how this is possible. Nothing should want to try popping elements from the finally stack after interruption has been initiated.

ntamas commented 2 years ago

IGRAPH_FINALLY_FREE() is called multiple times, from different places, usually due to on_exit(.Call(R_igraph_finalizer)), which is executed in the R layer every time we call down to C. Sometimes multiple times. If you put a printf("hey!\n") in R_igraph_finalizer and just let betweenness(make_ring(10000)) run to completion, you will see that it is printed three times.

szhorvat commented 2 years ago

Yes, but IGRAPH_FINALLY_FREE() cannot cause this "trying to pop 1 element(s)" error. Only an IGRAPH_FINALLY_CLEAN(1) can cause it. And how does that get called after interruption?

szhorvat commented 2 years ago

@vtraag If you have time, this might be interesting to try out and record a stack trace. I am not able to record stack traces yet.

void checkInterruptFn(void *dummy) {
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (! R_ToplevelExec(checkInterruptFn, NULL)) {
    IGRAPH_FINALLY_FREE();
    R_CheckUserInterrupt();    
    IGRAPH_FATAL("Must never reach here during interrupt handling.");
  }
  return IGRAPH_SUCCESS;
}
vtraag commented 2 years ago

I'll try to have a look tomorrow. My guess is that it won't work calling R_CheckUserInterrupt twice in a row, because R_interrupts_pending is reset to ​0​ during its call. I think we should hence manually set R_interrupts_pending = ​1 before calling R_CheckUserInterrupt again.

szhorvat commented 2 years ago

You are correct.

But then the following would still be an option, and I suspect that it will work well enough (because the fatal error version, compiled overnight, seems to work).

void checkInterruptFn(void *dummy) {
  IGRAPH_UNUSED(dummy);
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (R_ToplevelExec(checkInterruptFn, NULL) == FALSE) {
    IGRAPH_FINALLY_FREE();
    error("igraph was interrupted.");
  }
  return IGRAPH_SUCCESS;
}

In the previous attempt, I simply did return IGRAPH_INTERRUPTED. But the R interface's wrapper functions do not check the return values properly, and IGRAPH_INTERRUPTED is frequently returned directly (without triggering an igraph_error() call). Therefore we trigger the error immediately.

vtraag commented 2 years ago

Yes, this is what I was thinking of as well. Why not use IGRAPH_ERROR instead by the way?

szhorvat commented 2 years ago

Yes,

    IGRAPH_FINALLY_FREE();
    error("igraph was interrupted.");

could be replaced by

IGRAPH_ERROR("igraph was interrupted.", IGRAPH_INTERRUPTED);

BTW I confirm that it does work.


The remaining questions are:

szhorvat commented 2 years ago

@vtraag Per your suggestion, this works:

extern int R_interrupts_pending;

void checkInterruptFn(void *dummy) {
  IGRAPH_UNUSED(dummy);
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (R_ToplevelExec(checkInterruptFn, NULL) == FALSE) {
    R_interrupts_pending = 1;
    R_CheckUserInterrupt();
  }
  return IGRAPH_SUCCESS;
}

But then why did this one not work?

https://github.com/igraph/rigraph/commit/da2fe3ea1fa88d9bf138620776271c191616ce4e

Does the declaration of R_interrupts_pending perhaps need volatile in case it is set from other threads?

vtraag commented 2 years ago

I can confirm that the following is working correctly indeed on Windows as well:

void checkInterruptFn(void *dummy) {
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (! R_ToplevelExec(checkInterruptFn, NULL)) {
    IGRAPH_ERROR("igraph was interrupted.", IGRAPH_INTERRUPTED);
  }
  return IGRAPH_SUCCESS;
}
  • Does R_ToplevelExec has a significant performance impact?

Good question, I'm not sure. I think that IGRAPH_ALLOW_INTERRUPTION calls should preferably be limited anyway (once every second or so would be sufficient). In some places, we probably call IGRAPH_ALLOW_INTERRUPTION too often. At any rate, that is not something we will solve now. Probably we should just try a benchmark with an empty interrupt handler vs this interrupt handler using R_ToplevelExec.

  • Is it a problem to replace interrupts with errors?

Not as far as I'm concerned. It is slightly ugly, because it seems to raise an error from a particular source code line, while it just was interrupted. Since interruption will be a manual intervention by users, I think the risk of misunderstanding is limited.

  • Why does using a plain interrupt instead of an error crash?

Because the error handling properly frees the FINALLY stack before doing a longjmp while the interrupt did a longjmp before freeing the FINALLY stack?

@vtraag Per your suggestion, this works:

extern int R_interrupts_pending;

void checkInterruptFn(void *dummy) {
  IGRAPH_UNUSED(dummy);
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (R_ToplevelExec(checkInterruptFn, NULL) == FALSE) {
    R_interrupts_pending = 1;
    R_CheckUserInterrupt();
  }
  return IGRAPH_SUCCESS;
}

But then why did this one not work?

da2fe3e

Does the declaration of R_interrupts_pending perhaps need volatile in case it is set from other threads?

We could declare it perhaps as volatile, I remember trying that as well without success, but that might have been in a slightly different circumstance. Declaring it volatile will probably slow it down at least as much as using R_ToplevelExec I guess.

The reason it probably didn't work is because R_interrupts_suspended also still intervenes.

szhorvat commented 2 years ago

Good question, I'm not sure. I think that IGRAPH_ALLOW_INTERRUPTION calls should preferably be limited anyway (once every second or so would be sufficient).

At one point I did think about this, but I gave up on it. Doing this is simply not realistic. It is impossible to predict how often it will get called in "wall time". Benchmarking I did with Mathematica indicates that Mathematica's own interrupt check function is very fast to call, and there is no need to limit the number of calls to it (even though it also does something like ProcessEvents()).

There are decisions to be made such as: should the check be placed in the inner loop or outer loop of some calculation? Placing it in the inner loop risks slowdown (but that needs to be benchmarked), while placing it in the outer loop may cause a multi-second delay only in some circumstances. It's a pain to think about.

Instead, on the C core side, I would go with the assumption that an interruption check has the same cost as "2-3 functions calls", and not try too hard to optimize for slower checks.

If we want to limit the frequency of calls more than it already is, it should be done in the interrupt handler directly rather than in the C core:

static int counter = 0;
if (counter++ % 100 == 0) { check_for_interrupt(); }
szhorvat commented 2 years ago

@vtraag My last report for today, the following seems to work, assuming that I am compiling what I think I'm compiling.

extern const volatile int R_interrupts_pending;
extern const volatile int R_interrupts_suspended;

int R_igraph_interrupt_handler(void *data) {
  if (R_interrupts_suspended) {
    return IGRAPH_SUCCESS;
  }
  R_ProcessEvents();
  if (R_interrupts_pending) {
    IGRAPH_FINALLY_FREE();
    R_CheckUserInterrupt();
  }
  return IGRAPH_SUCCESS;
}

Can you please double-check this on your side? If yes, do you agree that we can go ahead with this version? (Some comments would need to be added, of course.)

szhorvat commented 2 years ago

In fact, I would add some extra protection in form of an IGRAPH_FATAL, which is safe for R (it doesn't crash):

extern const volatile int R_interrupts_pending;
extern const volatile int R_interrupts_suspended;

int R_igraph_interrupt_handler(void *data) {
  /* This function mimics the internal workings of R_CheckUserInterrupt(),
   * and may need to be updated in future versions. It is correct for R
   * versions 3.5 through 4.1, but not for 3.4 and earlier. */
  if (R_interrupts_suspended) {
    return IGRAPH_SUCCESS;
  }
  /* On Windows, R_ProcessEvents() does some processing necessary
   * to update the value of R_interrupts_pending. */
  R_ProcessEvents();
  if (R_interrupts_pending) {
    IGRAPH_FINALLY_FREE();
    R_CheckUserInterrupt();
    /* Must never reach here, R_CheckUserInterrupt() should not return
     * when R_interrupts_pending is true. If it does anyway, the following
     * call averts a crash due to FINALLY_FREE() having already released
     * resources. */
    IGRAPH_FATAL("Interruption failed.");
  }
  return IGRAPH_SUCCESS;
}

This way, if R_CheckUserInterrupt(); does not interrupt for whatever reason, there will be a nice error message telling the user to report the problem (and a crash will be averted).

vtraag commented 2 years ago

No, sorry, this crashes with backtrace:

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 12888.0x48d4]
0x00007fffc6ecf1d3 in ntdll!RtlIsZeroMemory ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) bt
#0  0x00007fffc6ecf1d3 in ntdll!RtlIsZeroMemory ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#1  0x00007fffc6ed7f92 in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#2  0x00007fffc6ed827a in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#3  0x00007fffc6eddf01 in ntdll!RtlpNtSetValueKey ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#4  0x00007fffc6df5b43 in ntdll!RtlGetCurrentServiceSessionId ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#5  0x00007fffc6df47b1 in ntdll!RtlFreeHeap ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#6  0x00007fff48bd7811 in NotifyShims () from C:\WINDOWS\SYSTEM32\AcLayers.dll
#7  0x00007fffc6509c9c in msvcrt!free () from C:\WINDOWS\System32\msvcrt.dll
#8  0x00000000655ac97a in igraph_vector_int_destroy (v=0x3bf48)
    at core/core/vector.pmt:378
#9  0x00000000655eacd7 in igraph_adjlist_destroy (al=0x441d0c0)
    at core/graph/adjlist.c:300
#10 0x0000000065591d37 in IGRAPH_FINALLY_FREE () at core/core/error.c:261
#11 0x00000000656e581d in R_igraph_finalizer () at rinterface.c:2479
#12 0x000000006c7a7bc3 in Rf_NewFrameConfirm ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#13 0x000000006c7a884d in Rf_NewFrameConfirm ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#14 0x000000006c7fd218 in Rf_eval () from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#15 0x000000006c795296 in Rf_endcontext ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#16 0x000000006c7fe91a in R_cmpfun1 ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#17 0x000000006c7ffbca in Rf_applyClosure ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#18 0x000000006c7fcdfc in Rf_eval () from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#19 0x000000006c8272f1 in Rf_ReplIteration ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#20 0x000000006c827661 in Rf_ReplIteration ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#21 0x000000006c8276fd in run_Rmainloop ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#22 0x000000006c82779e in Rf_mainloop ()
   from C:\PROGRA~1\R\R-41~1.3\bin\x64\R.dll
#23 0x000000000040171a in ?? ()
#24 0x000000000040158a in ?? ()
#25 0x00000000004013c5 in ?? ()
#26 0x000000000040152b in ?? ()
#27 0x00007fffc4f67034 in KERNEL32!BaseThreadInitThunk ()
   from C:\WINDOWS\System32\kernel32.dll
#28 0x00007fffc6e22651 in ntdll!RtlUserThreadStart ()
   from C:\WINDOWS\SYSTEM32\ntdll.dll
#29 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I think we should take this approach

extern int R_interrupts_pending;

void checkInterruptFn(void *dummy) {
  IGRAPH_UNUSED(dummy);
  R_CheckUserInterrupt();
}

int R_igraph_interrupt_handler(void *data) {
  if (R_ToplevelExec(checkInterruptFn, NULL) == FALSE) {
    IGRAPH_FINALLY_FREE();
    R_interrupts_pending = 1;
    R_CheckUserInterrupt();
  }
  return IGRAPH_SUCCESS;
}

with perhaps the added IGRAPH_FATAL as you said.

vtraag commented 2 years ago

The implementation I proposed above works correctly, and interrupts as usual. This is indeed better than treating is as an error, and just interrupts the calculation. If you want, I can open a PR with this solution to finalise the details, I have a commit ready to go.

szhorvat commented 2 years ago

I also have to leave now, please go ahead with whatever works without a crash.

I still can't understand why there's a difference between these two, but we can sort out the details after the 1.3.0 release. The priority now is not to crash.

szhorvat commented 2 years ago
#10 0x0000000065591d37 in IGRAPH_FINALLY_FREE () at core/core/error.c:261
#11 0x00000000656e581d in R_igraph_finalizer () at rinterface.c:2479

This is very weird because there is no way for the finally stack NOT to be empty in the finalizer. It was already emptied in the interrupt handler! Who put more stuff on it later?

In order to try to explain this, I instrumented the finally free framework (on macOS, not Windows!) to print out which function is called when. The finally stack is always empty when calling FINALLY_FREE() from the finalizer. This is the output I am looking at:

> betweenness(g)
FREE: rinterface.c:2482
[X] Finally stack will be cleaned (contained 0 elements)
FINALLY: rinterface.c:10695, igraph_vector_destroy
FINALLY: core/centrality/betweenness.c:339, igraph_vector_destroy
FINALLY: core/graph/adjlist.c:139, igraph_vector_destroy
FINALLY: core/graph/adjlist.c:151, igraph_adjlist_destroy
CLEAN: core/graph/adjlist.c:170, 2
FINALLY: core/centrality/betweenness.c:357, igraph_adjlist_destroy
FINALLY: core/graph/adjlist.c:139, igraph_vector_destroy
FINALLY: core/graph/adjlist.c:151, igraph_adjlist_destroy
CLEAN: core/graph/adjlist.c:170, 2
FINALLY: core/centrality/betweenness.c:359, igraph_adjlist_destroy
FINALLY: core/centrality/betweenness.c:371, igraph_free
FINALLY: core/centrality/betweenness.c:376, igraph_free
FINALLY: core/centrality/betweenness.c:381, igraph_free
FINALLY: core/centrality/betweenness.c:383, igraph_dqueue_destroy
FINALLY: core/centrality/betweenness.c:385, igraph_stack_destroy
^CXXX Doing interruption!
FREE: rinterface.c:2415
[X] Finally stack will be cleaned (contained 9 elements)
XX F_FREE done!

FREE: rinterface.c:2482
[X] Finally stack will be cleaned (contained 0 elements)

Line 2482 is the finalizer while 2415 is the interruption handler.

This was tested with a solution basically identical to that in https://github.com/igraph/rigraph/issues/505#issuecomment-1082771000, on macOS.