muellmusik / Utopia

This is an attempt at a generic library of tools for making Network Music Apps in SuperCollider
43 stars 8 forks source link

still send_to: Host is down #16

Closed telephon closed 7 years ago

telephon commented 7 years ago

We still get this all the time:

caught exception in primitive NetAddr:sendMsg
ERROR: send_to: Host is downERROR: Primitive '_NetAddr_SendMsg' failed.
Failed.
RECEIVER:
Instance of NetAddr {    (0x12cfb7258, gc=28, fmt=00, flg=00, set=02)
  instance variables [4]
    addr : Integer -1062705559
    port : Integer 57120
    hostname : nil
    socket : nil
}
CALL STACK:
    MethodError:reportError   0x12eba6f38
        arg this = <instance of PrimitiveFailedError>
    Nil:handleError   0x1811bd628
        arg this = nil
        arg error = <instance of PrimitiveFailedError>
    Thread:handleError   0x12f51b188
        arg this = <instance of Thread>
        arg error = <instance of PrimitiveFailedError>
    Object:throw   0x12eb9aa98
        arg this = <instance of PrimitiveFailedError>
    Object:primitiveFailed   0x12b5cd0d8
        arg this = <instance of NetAddr>
    Dictionary:keysValuesArrayDo   0x126a91158
        arg this = <instance of IdentityDictionary>
        arg argArray = [*20]
        arg function = <instance of Function>
        var i = 8
        var j = 2
        var key = nil
        var val = nil
        var arraySize = nil
    Dictionary:keysValuesDo   0x12f4582e8
        arg this = <instance of IdentityDictionary>
        arg function = <instance of Function>
    Dictionary:do   0x1859e9848
        arg this = <instance of IdentityDictionary>
        arg function = <instance of Function>
    AddrBook:sendExcluding   0x12b3f1418
        arg this = <instance of AddrBook>
        arg name = 'liuyawen'
        arg msg = [*4]
    OSCMessageDispatcher:value   0x125f67668
        arg this = <instance of OSCMessageDispatcher>
        arg msg = [*4]
        arg time = 6873.048582305
        arg addr = <instance of NetAddr>
        arg recvPort = 57120
    Main:recvOSCmessage   0x12864c4d8
        arg this = <instance of Main>
        arg time = 6873.048582305
        arg replyAddr = <instance of NetAddr>
        arg recvPort = 57120
        arg msg = [*4]
^^ The preceding error dump is for ERROR: Primitive '_NetAddr_SendMsg' failed.
Failed.
RECEIVER: a NetAddr 
muellmusik commented 7 years ago

Yes. Technically I think this is an SC issue, not a Utopia one. It didn't happen IIRC before boost::asio

My guess is that it happens here:

gUDPport->Socket().send_to( buffer(bufptr, msglen), address );

Possibly there should be a local catch function around that (or in the primitive maybe better) which would then return an SC error. Then I think it would be possible to use a try in Utopia or elsewhere to suppress this common case.

Or get more reliable collaborators! ;-)

telephon commented 7 years ago

ha :) do you know what situation causes it? and also i wonder: udp is connectionless, so how can it know anything about a "host"?

adcxyz commented 7 years ago

Hi s & j, we do network music with Utopia this semester again, got a rather nice setup together, up to 16 people :-) we do see this one a lot unfortunately ...

BTW I just got the clientID from server & make allocators issue sorted nicely, so now independent allocation finally works well. will submit PR (to SC3) when well-tested.

bests a

telephon commented 7 years ago

what is your quick workaround when it happens? it is quite annoying

telephon commented 7 years ago

thanks for the work on server, it's much needed.

muellmusik commented 7 years ago

ha :) do you know what situation causes it? and also i wonder: udp is connectionless, so how can it know anything about a "host"?

Seemingly might have to do with ARP requests: https://forum.pfsense.org/index.php?topic=122234.0

muellmusik commented 7 years ago

BTW I just got the clientID from server & make allocators issue sorted nicely, so now independent allocation finally works well. will submit PR (to SC3) when well-tested.

Sorry what was the issue? I used this on a couple of projects after it was added initially, and it always worked fine.

muellmusik commented 7 years ago

I can't figure out how to get a reproducer.

Do either of you have one?

telephon commented 7 years ago

It always involves at least two computers. Today it happened when one of the students closed down his computer, which before had been "hailed into" Utopia.

muellmusik commented 7 years ago

I'll see if I can get it going tomorrow if I get a moment. I have a couple of machines in my office so can try. I've guessed it had something to do with sleeping or machines otherwise falling off the network.

Utopia will trigger it because it keeps trying to Hail. We could make it give up after a certain period of non-response, but it seems the wrong solution.

muellmusik commented 7 years ago

But the fact that we're getting 'caught exception in primitive' suggests to me that something is not being handled correctly. That's just a catch all in doPrimitive to make sure the lang doesn't fall over when something unexpected happens.

telephon commented 7 years ago

yes, that seems wrong. I don't see why it should be wrong to try and send to an ip that seems to be down.

telephon commented 7 years ago

yes

muellmusik commented 7 years ago

Well I suspect it dutifully notifies you and expects you to handle it in an appropriate way if you don't care. I can put in a try catch in the right place, but until I can reliably reproduce, I won't know if it's helping.

telephon commented 7 years ago

if you have an idea how to catch it, we can try and reproduce it next week (tuesday).

muellmusik commented 7 years ago

Okay, I just spent an hour trying to reproduce it, and failed (though I did notice that there seems to be a mistake with Hail and how it tracks online status). I can get the network unreachable though, which I suspect is thrown from the same point.

muellmusik commented 7 years ago

So looking a little closer, in doPrimitive we have blocks like this:

} catch (std::exception& ex) {
    post("caught exception in primitive %s:%s\n", slotRawSymbol(&slotRawClass(&meth->ownerclass)->name)->name, slotRawSymbol(&meth->name)->name);
    error(ex.what());
    err = errException;
}

IIUC this means we do get the SC error, which we should be able to suppress with an SC try, but we can't suppress the warning from error(ex.what());.

So we could for instance wrap netAddrSend(slotRawObject(netAddrSlot), packet.size(), (char*)packet.buf); in a try, and catch certain errors, but I'd need to look a little more closely at this to see what's the correct way. I think it would be to define new SC error types, e.g. errNetworkUnreachable, errHostIsDown

muellmusik commented 7 years ago

The following seems to confirm my suspicion of where this is happening.

NetAddr("147.108.63.7", 51720).sendMsg(\foo)

try { NetAddr("147.108.63.7", 51720).sendMsg(\foo) } { \caught.postln }

The former throws the SC error, the latter only the warning. Since network errors should not be suppressed in normal usage, I think the only way to deal with this is new SC error types. We'd need one for host is down and one from network unreachable. Anything else?

muellmusik commented 7 years ago

Hmm. Another, possibly better solution would be to have error.what() stashed somewhere accessible rather than posted, and then recalled in prPrimitiveErrorString. It's more general.

muellmusik commented 7 years ago

I'm working on this...

muellmusik commented 7 years ago

Okay got something. Will submit a PR tomorrow.

muellmusik commented 7 years ago

https://github.com/supercollider/supercollider/pull/2876

Will close this when merged, but test in the meantime if you can.

mossheim commented 7 years ago

This can be closed now that SC/2876 is merged!

muellmusik commented 7 years ago

Thanks @brianlheim!