pear2 / Net_RouterOS

This package allows you to read and write information from a RouterOS host using the MikroTik RouterOS API protocol.
http://pear2.php.net/PEAR2_Net_RouterOS
241 stars 116 forks source link

Client hangs after /system reboot command #18

Open FezzFest opened 8 years ago

FezzFest commented 8 years ago

After executing a /system reboot command, the client hangs and does not return. Example:

    $client = new RouterOS\Client($ip, 'admin', 'password', null, false, 10);
    $request = new RouterOS\Request('/system reboot');
    $client->sendSync($request);
    echo 'OK';

In the above example, the echo statement is never reached and 'OK' is never printed to the screen. The same thing happens with the 'set-and-forget' method (using asynchronous calls).

    $client = new RouterOS\Client($ip, 'admin', 'password', null, false, 10);
    $request = new RouterOS\Request('/system reboot');
    $request->setTag($id);
    $client->sendAsync($request);
    $client->loop();
    echo 'OK';

If I omit the loop() method, the echo statement is reached but the request is never sent. What am I doing wrong?

boenrobot commented 8 years ago

You're using a Linux web server I'm guessing?

I'm aware of this issue, but I have no idea how to solve it... Once upon a time (b4), this was also an issue with Windows. I did fixed it for Windows, and thought that was the end of it, but alas, no.

If you find a way to solve it, I would very much welcome a pull request (or even just a hint of the solution...).

In the meantime, the workaround is to create a scheduler item that runs after 2 seconds, and on its run, removes itself, and then reboots, i.e.

    $client = new RouterOS\Client($ip, 'admin', 'password', null, false, 10);
    $request = new RouterOS\Request(
        '/system scheduler add name=REBOOT interval=2s
        on-event="/system scheduler remove REBOOT;/system reboot"'
    );
    $client->sendSync($request);
    echo 'OK';
    unset($client);

Note also that the $client object must be unset() before the time of the actual reboot, or otherwise go out of scope (e.g. if the code above was within a function - have the function end after the echo, and remove the unset() call).

What triggers the hang ultimately is the fact that upon disconnect/unset, a "/quit" command is sent. But because the connection is closed, the command is never successfully sent, and the client keeps retrying. There is a check as to whether the connection is even opened before every sending attempt (that was the solution with Windows), but while that check works for Windows, with Linux it doesn't for some reason. And a "/quit" is sent in the first place to prevent a memory leak for some older RouterOS versions and some RouterBOARDs.

khandieyea commented 7 years ago

I would say this has little to do with the /quit being sent on close()

Passing NULL to PEAR2\Net\RouterOS\Client::dispatchNextResponse (from completeRequest) eventually passes NULL as the tv_sec argument to stream_select() in PEAR2\Net\Transmitter\stream::isDataAwaiting.

If tv_sec is NULL stream_select() can block indefinitely, returning only when an event on one of the watched streams occurs.

No events are going to occur at this point, afaik. Maybe the windows stack does it differently

boenrobot commented 7 years ago

That's an interesting point, thanks.

The problem I was detecting on Windows was a similar thing, where isAcceptingData() was called without an isAvailable() check, so that was the fix. The client in general is running on the assumption that if you have managed to even send a request, you should keep waiting for the response, but I hadn't considered you may end up successfully sending a "/quit", but not receive a reply because the restart would occur before you do.

EDIT: Hmm... but even if I add an isAvailable() check before isDataAwaiting(), there remains a theoretical possibility that in between the isAvailable() check, and the stream_select() call, the connection is closed, and the client is left hanging anyway. What Windows is doing differently (or perhaps, what PHP is doing differently on Windows) is precisely to acknowledge this possibility, so that if a closed connection is passed to stream_select(), it is immediately discarded from the list, and with 0 connections left to check, stream_select() returns 0 immediately, instead of waiting indefinitely.

Then again, this is a very unlikely possibility, so I'll add that anyway.

boenrobot commented 7 years ago

@khandieyea or @FezzFest

Could either of you please test with the "develop" branch of Net_Transmitter on Linux to see if this last commit fixes the issue?

(I know this isn't a PHAR, but you can install the develop branch with Composer...)

khandieyea commented 7 years ago

Hey @boenrobot, I've tested develop, sadly no change with rebooting.

boenrobot commented 7 years ago

Well... I can't say I'm surprised, but it was worth a shot. Thank you for the tip and testing anyway.

What distro and version are you using anyway? (Maybe I could try making myself a VM with it some time...)

If the Windows analog is any indication, the problem is indeed that the sending attempt (fwrite() call) of "/quit" keeps failing, yet is being retried infinitely, and the feof() or stream_select() checks don't help on Linux for some reason... Or (more likely now, post the Windows fix), feof() in particular doesn't work on Linux's network streams (whereas on Windows, feof() returns true if the connection is closed), thus causing stream_select() to wait indefinitely for sending (I mean, remember, as soon as the reboot gives out its !done reply, the router just silently drops all connections and reboots, making it unable to even receive a "/quit", let alone reply to it).

khandieyea commented 7 years ago

Hi @boenrobot

We're running pretty standard ubuntu 16.04.

I'm fairly certain this has nothing to do with the final /quit. Even with that code removed, I see the same behaviour. I'm also yet to see a !done coming back from the router in wireshark.

All I see is the /reboot being sent, and then "poof" it's gone.

As a side node - this issue exists in other PHP clients, and also persists in other node and python implementations. However Java is apparently OK (never tested).

boenrobot commented 7 years ago

All I see is the /reboot being sent, and then "poof" it's gone.

That's just it. Because the connection gets closed, the fwrite() call that sends "/quit" fails, and thus a packet never actually goes over the wire to be seen by Wireshark. It's exactly like that on Windows, except that thanks to the isAcceptingData() check, the client successfully gives up.

As a side node - this issue exists in other PHP clients, and also persists in other node and python implementations.

But this... this is new... Denis Basta's API client doesn't send a "/quit", so I would've thought it wouldn't be affected. But then again, I haven't tried it personally either. And I'm not aware of the other's intricacies, but I wouldn't be surprised if the Node and Python clients have the same problem as I did for Windows and haven't fixed it, while the Java one has it fixed, and yet the Java runtime does some magic to make the checks work the same way for Linux as well.

Even with that code removed, I see the same behaviour.

Just to be clear... even with this whole block removed/commented? That's new too... Back when I first got a report about this, removing this worked, but I've merely been too stubborn to remove it completely for the reasons mentioned previously in this issue.

boenrobot commented 7 years ago

Huh... funny... to add another weird twist to all of this... I just set up an Ubuntu Server 16.10, and I can't replicate this on it.

With only the built in packages and the built in OpenSSH added (to make it easier on me to test...), all updated with sudo apt-get update; sudo apt-get upgrade. I used the PHP available with sudo apt-get php-common php-cli, and the version I got is PHP 7.0.8-3ubuntu3 (cli) ( NTS )... Doesn't happen there. This is with the "develop" branches of both Net_RouterOS and Net_Transmitter, but considering I've done nothing specifically targeted at this issue, I'm very surprised it's not happening.

My full test code

<?php

use PEAR2\Autoload;
use PEAR2\Net\RouterOS;

error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', 1);

require_once 'Autoload.git/src/PEAR2/Autoload.php';
Autoload::initialize(__DIR__ . '/Net_RouterOS.git/src');
Autoload::initialize(__DIR__ . '/Net_Transmitter.git/src');
var_dump(Autoload::getPaths());

$client = new RouterOS\Client('192.168.88.1', 'admin', '');
$client->sendSync(new RouterOS\Request('/system reboot'));
sleep(2);
$char = $client->getCharset(RouterOS\Communicator::CHARSET_REMOTE);
var_dump($char);

echo 'OK?';
echo "\n";

The added sleep(2); is there to make sure the "/quit" attempt only happens after the reboot commences. With or without it, there's no error of any kind at any point.

I wonder if it's a kernel issue that's already fixed, perhaps as recently as in between 16.04 and 16.10.

khandieyea commented 7 years ago

Is it actually rebooting?

boenrobot commented 7 years ago

It is rebooting, yes, but more importantly, doing so without any errors or hangs on the PHP side. Previously, it would reboot as well, but as @FezzFest mentiond, it would just hang OR (as other reports I've had), it wouldn't hang, but would finish up with an error.

(The RouterOS I'm using is a real 951Ui-2HnD with 6.37.1; The Ubuntu server is in a Hyper-V VM...)

khandieyea commented 7 years ago

Had to ask, I've had issues with "/system reboot" not actually rebooting, but "/system/reboot" working.

Is your routeros target on the same LAN? I'll test with 16.10

boenrobot commented 7 years ago

Heh. This API client translates "/system reboot" to "/system/reboot" under the hood, but most others don't, so no surprise there ;-) . LeGrange's Java client is among the few others who do this.

Yes, the RouterOS and VM are in the same LAN, thanks to Hyper-V's switching. Both in the 192.168.88.0/24 subnet. I don't think I can setup a more complicated setup than that though, as trying to do NAT with Hyper-V in place can be kind of tricky, and equally tricky is setting up Ubuntu Server (or any x64 OS) on VirtualBox...

khandieyea commented 7 years ago

Yea it is strange. Ok that's all great. I'm building 2 vanilla 16.04 and 16.10 boxes, will implement your test structure, and see what happens.

Thanks a million!