Open Miosss opened 4 years ago
Forget about the previous comment. It's all wrong. An interrupted signal call gives a EINTR, not a EAGAIN. I undid the changes.
So what is the problem exactly? Provide code that demonstrates.
@pebbe
I am not sure if it is easily reproducible. I believe that the main reason behind this is exactly what zmq_errno()
is for. I found it through this SO
Look at the definition of this function:
int zmq_errno (void) { return errno; }
It returns just the errno - but from the context of the library itself.
In my case I have libzmq.dll built with MSVC, but I use gcc from MSYS2 for CGO. Therefore there may be the problem with proper propagation of the errno
- situation described in here.
Your error handling relies on what C calling subsystem in GO gives you here:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
zmq_msg_recv
only returns the size of the message, the err
is given by golang:
Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error (use _ to skip the result value if the function returns void). For example:
by godoc
So this err
is basically the same as just reading errno (which you in fact do in errget
).
The problem is - errno
in dll may be different errno
in the app.
libzmq sets errno which only resides in dll, and in my app errno is always 0.
I understand that this problem is why zmq_errno
came to be.
I run
msg, err := client.RecvMessage(zmq.DONTWAIT)
I am sure that there is no message in queue - I should receive msg = nil
and err = EAGAIN
.
This doesn't happen. I get msg = []byte("")
(empty message) and err = nil
.
By debugging your code I can see that:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
in this example returns size = -1
and err = nil
.
Size = -1 clearly indicates that there IS and error, but Go gives you err = nil. In the next if
you check the size to see if there is an error (and there is) and to get the actual error - you look into err
. Which is nil.
So, size
tells that there is an error, and err
says there is none.
To me, the cause is in what I wrote in the begging - libzmq sets different errno, than CGO returns. You should probably check zmq_errno instead.
And look at this quote from zmq.h:
/ This function retrieves the errno as it is known to 0MQ library. The goal / / of this function is to make the code 100% portable, including where 0MQ / / compiled with certain CRT library (on Windows) is linked to an / / application that uses different CRT library. / ZMQ_EXPORT int zmq_errno (void);
I think I may have a fix. Can you try the latest version, please?
The same situation happens when Binding to the same TCP port for the second time - it silently fails, but without an error. Therefore the process thinks that it can accept messages, while the underlying socket is dead.
This occurs in Bind (zmq4.go963):
i, err = C.zmq4_bind(soc.soc, s)
i = -1
but err = nil
, so the same as previously.
I have downloaded your latest version and it seems fixed - this case (binding) now correctly returns error (though I am not sure why is it error 100 -> Cannot create another system semaphore, but this must be libzmq thing). I have not yet tested the EAGAIN case, but I believe it is the same as for Bind.
The problem
I issue non-blocking read on
DEALER
socket connected toROUTER
socket.data, err := client.RecvMessage(zmq.DONTWAIT)
ROUTER
takes at least 1 second to complete the task (due tosleep()
) and I do the read immediately.I expected to get
EAGAIN
error, but instead I goterr == nil
andlen(data) == 0
- proper empty read.Situation
By debugging the library it seems to me, that this call starts the error (RecvBytes, zmq4.go:1077):
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
Here,
size == -1
buterr == nil
. Thereforeerrget(err)
withnil
returnsnil
instead of true error.Maybe
errget
should do something when it is call withnil
argument?I believe that the root cause of this particular problem is not using
zmq_errno
. In the documentation of that function it is said, that it should be used to properly geterrno
, when for example in a situation, where the application links to different C runtime, than the libzmq.This is probably my case, because this happens on Windows, I have libzmq.dll built with MSVC and then generated stub libzmq.a using gcc dlltools. So the setup is exotic (but hey, welcome to compiling C libs on Windows + Go + Cgo). What's more, during
C.
calls in Go, it returns plainerrno
and it is essentialy wrong in this case.When I tried
e := C.zmq_errno()
just after the failed read - I get the correct EAGAIN (11) error.Solutions?
While I probably could check
C.zmq_errno()
after each call, but I am not sure if it is sufficient enough and will the errors be cleared after succesful calls? EDIT: No, the error is not cleared. And since the returned err is nil, there is no way to now thatC.zmq_errno()
result is valid in this situation (+ all the threading issues possible).One solution may be to drop all err from
_, err := C. ...
and callC.zmq_errno()
instead? But it will require changes in many places.Maybe modifications to
errget
will be sufficient? For example if argumenterr
isnil
the check theC.zmq_errno()
?