sustrik / libdill

Structured concurrency in C
MIT License
1.68k stars 156 forks source link

Deadlock scenario #185

Closed mulle-nat closed 5 years ago

mulle-nat commented 5 years ago

I am making this an own issue from the other thread, because I think this is a bug. If this is intentional behaviour it'd be interesting to know. I reduced my test case even more, to make it more understandable and the problem more prominent.

#include <stdio.h>
#include <sys/time.h>

#include <libdill.h>

coroutine void   consumer( int ch)
{
   for(;;)
   {
      fprintf( stderr, "before chrecv: %lld\n", dill_now());
      if( chrecv( ch, NULL, 0, -1) == -1)
      {
         fprintf( stderr, "failed chrecv: %lld\n", dill_now());
         switch( errno)
         {
         case EPIPE     :
         case ECANCELED : return;
         default        : perror("chrecv"); abort();
         }
      }
      fprintf( stderr, "after chrecv: %lld\n", dill_now());

      printf( "Press RETURN to continue\n");
      fflush( stdout);
   }
}

int    main( int argc, char *argv[])
{
   int   handle;
   int   ch[ 2];

   chmake( ch);

   handle = go( consumer( ch[ 0]));

   fprintf( stderr, "before chsend: %lld\n", dill_now());
   chsend( ch[ 1], NULL, 0, -1);
   fprintf( stderr, "after chsend: %lld\n", dill_now());

   //
   // The chsend didn't wake up the receiver. 
   // Neither does a chdone wake up the receiver
   //
   chdone( ch[ 1]);

   //
   // The user will wait endlessly without the prompt from consumer
   //
   getchar();

   bundle_wait( handle, -1);
   return( 0);
}

This is the input/output. (The empty line is me pressing RETURN for getchar)

before chrecv: 2564798
before chsend: 2564798
after chsend: 2564798

after chrecv: 2566314
Press RETURN to continue
before chrecv: 2566314
failed chrecv: 2566314

It is clear that the chread is not triggered during chsend. This is curious because there is a dill_trigger in dill_chsend to wake up channel readers presumably, but it does not seem to do its job. The chdone doesn't trigger the chread either. But the bundle_wait eventually does.

sustrik commented 5 years ago

You are using getchar which is a blocking function and block all the coroutines. Try deleting that line.

mulle-nat commented 5 years ago

The getchar is just there to make the log somewhat easier to read for the timestamps and to illustrate the general problem. It has really nothing to do with the question.

sustrik commented 5 years ago

But it blocks the entire thread and the coroutines thus can't run. Try deleting it. Or relace it with msleep.

mulle-nat commented 5 years ago

The outcome is really no different. Did you try it ? I don't know how I can explain this better as I already wrote:

is clear that the chread is not triggered during chsend. This is curious because there is a dill_trigger in dill_chsend to wake up channel readers presumably, but it does not seem to do its job. The chdone doesn't trigger the chread either. But the bundle_wait eventually does.

Do I expect to much from dill_trigger ?

sustrik commented 5 years ago

Well, this is your output from above:

before chrecv: 2564798
before chsend: 2564798
after chsend: 2564798

after chrecv: 2566314 <---  chrecv got a message
Press RETURN to continue
before chrecv: 2566314
failed chrecv: 2566314

It only happens after you press a key because getchar blocks the entire thread.

mulle-nat commented 5 years ago

The question again is not about that but, why It doesn't print:

before chrecv: 2564798
before chsend: 2564798
after chrecv: 2564798  <<<<<<<<<<<<<< NOT PRINTED BUT WHY ??
after chsend: 2564798

The getchar is way after the point in question, it's just there for illustration purposes. It has NO pertinence on the question.

sustrik commented 5 years ago

Why it doesn't print what?

The line of code is:

fprintf( stderr, "after chrecv: %lld\n", dill_now());

The printed string is:

after chrecv: 2564798

That looks OK to me, no?

mulle-nat commented 5 years ago

I'll try it one more time, putting the question

It is clear that the chread is not triggered during chsend. This is curious because there is a dill_trigger in dill_chsend to wake up channel readers presumably, but it does not seem to do its job. Do I expect to much from dill_trigger ?

in kind of a flow diagram. So if chsend wakes the receiver chrecv a sequence of call would be:

main coroutine
go  
  printf "before chrecv"
  chrecv #1
  trigger
printf "before chsend"
chsend #1
trigger
  chrecv #1 continued
  printf "after chrecv"
  chrecv #2
  trigger
chsend #1 < continued>
printf "after chsend"
...

This would print the sequence:

before chrecv
before chsend
after chrecv
after chsend

But the actual sequence printed is

before chrecv
before chsend
after chsend

This indicates to me either there is a bug in libdill or that "I expect to much from dill_trigger", though I don't see how that behaviour not to switch during a send would be desirable.

sustrik commented 5 years ago

You should make no assumptions about how scheduler schedules coroutines.

After message is passed between coroutines, both sending coroutine and receiving coroutine are free to continue. Scheduler will pick one of them. You can't know which one in advance.

mulle-nat commented 5 years ago

But does a dill_yield after dill_chsend guarantee it though ?

I wrote my own dill_chbroadcast (See: [https://github.com/mulle-nat/libdill/blob/master/chan.c#L224]()), which uses dill_yield after dill_trigger. It seems to work OK, but I am just assuming the reliable context switch there. Can dill_yield after dill_trigger really guarantee it ?

I wonder if doing this with libdill is a good idea and if I shouldn't roll my own stuff based on deboost.context or something to get the reliability (and speed) I need. After all I just don't need much...

sustrik commented 5 years ago

No, you can't rely on scheduler working in a deterministic manner.

Also, dill_trigger is an internal function and shouldn't be used from outside.

As for deboost, I have no experience with it, so I can't tell.

mulle-nat commented 5 years ago

Ok thanks for the help.