Closed rom1v closed 11 years ago
So each binding requires a separate socket connection? I guess that should work without too much impact to the existing code.
Long term I would have preferred to build a local port binding lookup table, with handler function for processing each payload. But that will require rewriting each command line operation to work in an event driven manner.
However most of the tests in the tests/dnaprotocol script are failing to resolve phone numbers.
So each binding requires a separate socket connection? I guess that should work without too much impact to the existing code.
Yes, that way we can provide a socket-like API to the client easily.
Long term I would have preferred to build a local port binding lookup table, with handler function for processing each payload. But that will require rewriting each command line operation to work in an event driven manner.
What would be the benefits?
However most of the tests in the tests/dnaprotocol script are failing to resolve phone numbers.
Thank you for your feedback. I just committed a fix (4b273b4).
rom@rompc:~/Documents/serval-workspace/batphone@development/jni/serval-dna@mdpsock$ tests/dnaprotocol
1. Lookup by wildcard... PASS
2. Lookup by empty string... PASS
3. Lookup non-existent phone number... PASS
4. Lookup local phone number... PASS
5. Lookup remote phone number... PASS
6. Node info auto-resolves for local identities... PASS
7. Node info resolves remote identities... PASS
8. Lookup phone number three nodes reply... PASS
9. Lookup phone number two nodes reply... PASS
10. Lookup phone number one node replies... PASS
10 tests, 10 pass, 0 fail, 0 error
I just added MDP JNI bindings native-part.
It provides the implementation of native methods of MeshSocket API. (java-part is here: https://github.com/servalproject/batphone/pull/51)
I merged the current development branch.
https://groups.google.com/d/msg/serval-project-developers/bebngkEXesI/801JB958dZEJ
One minor concern I have is overlay_mdp_recv() becoming blocking. Because servald is designed to be single-threaded, we need to be able to support fully asynchronous operation. As servald may end up using MDP socket services as a client (eg to talk to other servald instances on other nodes to exchange various information), we need a non-blocking solution. It might just be a flag passed in to overlay_mdp_recv() to tell it to be non-blocking, so that we have that option for when it is needed.
Using the same pattern there was in mdp_client.c
, the code would become (+ possibly a flag for enabling/disabling it) :
int overlay_mdp_send(int mdp_sockfd, overlay_mdp_frame *mdp, int flags, int timeout_ms)
{
//...
set_nonblock(mdp_sockfd);
int result=sendto(mdp_sockfd, mdp, len, 0, (struct sockaddr *)&name, sizeof(struct sockaddr_un));
set_block(mdp_sockfd);
//...
}
int overlay_mdp_recv(int mdp_sockfd, overlay_mdp_frame *mdp, int port, int *ttl)
{
//...
set_nonblock(mdp_sockfd);
ssize_t len = recvwithttl(mdp_sockfd,(unsigned char *)mdp, sizeof(overlay_mdp_frame),ttl,recvaddr,&recvaddrlen);
set_block(mdp_sockfd);
//...
}
Since my commits providing "MDP sockets", mdp_sockfd
is given as a parameter (instead of being a global variable), so I think it is the responsability of the caller to mark the socket as blocking/nonblocking, and manage what to do when non blocking and errno == EAGAIN || errno == EWOULDBLOCK
.
Possibly, the blocking/nonblocking mode could be a parameter of the MDP socket init function overlay_mdp_client_socket
.
What do you think ?
The best way for a library to support different main-loop strategies (blocking/non-blocking, threaded, coroutines, callbacks, etc.) is to simply get out of the way and let the caller set up what it needs to.
In this case, the MDP client library has no business calling set_nonblock()
and set_block()
on the socket. The caller should do that. The library already exposes the socket's file descriptor as the mdp_sockfd
parameter, so this is trivial for the caller to do. The library could expose the set_block()
and set_nonblock()
calls (in fact, macros) as part of its API as a convenience for the caller, but the caller can always just use the fcntl(2) sys call directly.
What the MDP library should do is to make sure that it handles the EAGAIN (and EWOULDBLOCK if present) errors as normal conditions, ie, without logging an error or even warning. See the definitions of the _write_nonblock()
and _read_nonblock()
functions for an example of what this means. In fact, recvwithttl()
appears to already do this. It does not treat EAGAIN or EWOULDBLOCK as errors, and handles the case of len == -1
the same as len == 0
.
I totally agree with your analysis.
I also noticed a little side effect: every time a socket is created, it creates a new file in /data/data/org.servalproject/var/serval-node/. But when the socket is closed, the file is not deleted. This is not very important because when a collision occurs, we delete the file before binding a new socket, but there is no reason for keeping so many files like this:
mdp-client-2113-d4b10929.socket
mdp-client-2113-a812c682.socket
mdp-client-2113-e689b1f8.socket
mdp-client-2113-0c0916fb.socket
mdp-client-2113-5807c93e.socket
mdp-client-2113-734d5e29.socket
mdp-client-2113-8a0a67fa.socket
mdp-client-2113-386b4def.socket
mdp-client-2113-7cf2c673.socket
...
Therefore, I just commited some changes for keeping the mapping mdp_sockfd/sun_path, and removing the file on socket close (commit 92a28a2). If you see a better way to do that, please do not hesitate.
As well as deleting temporary socket files on close, it is important to also clean up stale temporary files on start or at regular intervals. Our code does not always follow this policy because it is of experimental or prototype quality, but we are deliberately moving towards production quality code, so we need to start insisting on this kind of design principle in new code.
In fact, I suggest that, as a matter of design policy, unlink-on-close should possibly never be performed, instead we should always rely on the clean-up logic to remove stale files. This would ensure that testing can easily reveal that it is working, instead of waiting for exceptional events to discover that it is not.
Our policy for operation on any platform should always be to explicitly design (and test for) for power-off/SIGSEGV/SIGKILL events and long-term, unattended operation. So we can't afford to let temporary files accumulate (or unused database rows accumulate, or log files accumulate, etc.).
As well as deleting temporary socket files on close, it is important to also clean up stale temporary files on start or at regular intervals.
You're right. I just implemented a clean up of all socket files on servald start (commit db2fefc).
Maybe we should clean up more (what else?). Maybe it is dangerous to clean up all socket files (will we need a socket persistent across servald restarts?). But it seems ok.
In fact, I suggest that, as a matter of design policy, unlink-on-close should possibly never be performed, instead we should always rely on the clean-up logic to remove stale files.
In my opinion, we should also keep the unlink-on-close policy (commit 92a28a2), because we can have a lot of socket files during a servald "session": every single command creates a MDP socket (and closes it).
@lakeman You closed the pull request?
Not deliberately. I deleted "master" as part of cleaning up our branch names, so I guess I got the blame for making the pull impossible. I should be able to find time to look at this patch properly next week.
I just merged development
branch into mdpsock
(so mdpsock
is up to date).
Unfortunately, github does not see it, but I think it is not important.
I added Java bindings for flags and QoS management. https://github.com/rom1v/serval-dna/commit/8454944653d0a2ab8dd833d93104d35bbc632f8a https://github.com/servalproject/batphone/pull/51#commit-cfb9be76188124a245170f545362f9f2035f6b51
Due to this pull request unintentional close, I opened a new one: https://github.com/servalproject/serval-dna/pull/53
It is not possible to use several "MDP sockets" with the API: there are no "sockets" exposed (in fact, there is only one per app, as
mdp_client_socket
is an external variable inmdp_client.c
).This is a problem, because the
overlay_mdp_recv()
function just retrieve the "first" packet, whatever it is. If it is not the right port, it is ignored (but the thread which listened to that port will never receive it, it is dropped).Therefore, there is no way to "multiplex" services. If a service wants to listen on port x and another on port y, some packets from x will be "received" by the recv function of y (which will just drop it, but in that case this packet is lost for no reason and the recv() function considers it has received a packet which is invalid).
Moreover, functions like
overlay_mdp_getmyaddr()
oroverlay_mdp_bind()
do calloverlay_mdp_send()
, which cancall overlay_mdp_recv()
. The received response could be one packet from another device, which has nothing to do with the request, so it will be dropped. The easiest way to see the problem is to send a packet to yourself (to your own SID), and follow what happens in debug mode.To fix it, I changed a bit the MDP API and implementation for providing "MDP sockets" support (commit f55e2b7):
overlay_mdp_client_init()
is replaced byoverlay_mdp_client_socket(void)
. Instead of returning 0 or -1, it returns the socket descriptor or -1.overlay_mdp_client_done()
is replaced byoverlay_mdp_client_close(int mdp_sockfd)
.Each function now takes a parameter
mdp_sockfd
. Thismdp_sockfd
is created byoverlay_mdp_client_socket()
. There is no globalmdp_client_socket
anymore inmdp_client.c
: instead, each service creates and uses its own MDP socket.Then, I adapted usage in
commandline.c
.Before that, I needed to make a tiny change: make
overlay_mdp_client_socket_path
local tooverlay_mdp_client_init()
(now renamedoverlay_mdp_client_socket()
for consistency with socket API). It was easy, because the unlink inoverlay_mdp_client_done()
was useless (it was already done inoverlay_mdp_client_init()
). (commit 263232b)Quite independently, for making it work, I also needed to make
overlay_mdp_recv()
blocking. (commit 9d44386)What I tested after these changes:
What I did not tested:
commandline.c
, because I don't know them (but I applied the same pattern for usingmdp_sockfd
).Please give me any feedback.
Tip: for some functions, you should use
git diff -w
for ignoring whitespace differences (for example if I removed aif() {}
condition, the indentation has changed).