rannn505 / child-shell

Node.js bindings 🔗 for shell
http://rannn505.github.io/child-shell/
MIT License
301 stars 71 forks source link

invoke() promise never resolves #94

Closed dillonkrug closed 2 years ago

dillonkrug commented 5 years ago

I have been having an intermittent issue in my application. Very difficult to find but I believe I have tracked down the issue to what seems to be a bug in this module.

The problem lies in the _write method of ShellStreamBuffer, in lib/utils.js:

    if(chunk.compare(this.EOI) === 0) {
      cb();
      return this.emit('EOI');
    }

This code seems to assume that the EOI "signal" will be the entire chunk. Based on my understanding of how streams work (and my testing), this is a faulty assumption. If multiple invocations happen in quick succession, or if the invoked commands output to stdout, the chunk may look something like this:

output from script
another line from the script
EOI_<random string>

in cases like this, the EOI signal will not be recognized, and the promise returned by invoke() will never resolve.

I have worked around this problem by making two changes.

1) in lib/shell.js, update the invoke() method so that the EOI signal is output on it's own line. 2) in lib/utils.js, update the logic in _write() to first break the chunk into lines and compare each line individually.

the first part is simple. just replace

      .then(() => shellSafeWrite(this.streams.stdin, `[Console]::Out.Write("${EOI}")`))

with this:

      .then(() => shellSafeWrite(this.streams.stdin, `[Console]::Out.WriteLine("");[Console]::Out.WriteLine("${EOI}")`))

The second step is slightly more complicated. We need to change a few things.

in the constructor, define a buffer that will hold the (possibly incomplete) last line of output

this.lastLine = Buffer.from([]);

and we need to update _write to be the following:

  _write(chunk, encoding, cb) {
    var tailBuf = Buffer.concat([this.lastLine, chunk]);  // include data from the last line of the previous chunk
    var lines = splitBuffer(tailBuf, EOL_BUFFER), line; // split the buffer by os.EOL
    while (lines.length > 1) { // for each line of the output
      line = lines.shift();
      if (line.compare(this.EOI) === 0) { // check if it matches our EOI signal
        cb();
        return this.emit('EOI');
      } else {
        this.chunksArray.push(line, EOL_BUFFER) // if not EOI signal, add it to the output.
      }
    }
    this.lastLine = lines[0];  // the last item is a possibly incomplete line of output, so we store it until we receive the next chunk
    return cb();
  }

definitions used by _write:

const EOL_BUFFER = Buffer.from(os.EOL);
function splitBuffer (buf, delim) {
  var search = -1,
    chunks = [],
    move = delim.length;

  while ((search = buf.indexOf(delim)) > -1) {
    chunks.push(buf.slice(0, search));
    buf = buf.slice(search + move);
  }

  chunks.push(buf);

  return chunks;
}
rannn505 commented 4 years ago

Hey @dillonkrug, Nice catch.. thx for this... In the next version I will release a fix. I thought about something like this if chunk includes EOI delete EOI from chunk add it to array emit EOI eles same as before What do you think??

dillonkrug commented 4 years ago

@rannn505 It's possible that would work, but with pseudocode it's hard to tell for sure.

My understanding of streams is that in the data callback, chunk is an arbitrary "window" into the data. in practice, a single line of output will most likely correspond to a chunk, but this isn't guaranteed.

Take the example output of a script:

        abcdefg\nEOI_SIGNAL_1234567890\nhijklmnopqrstuvwxyz
1       |---- 1 ------|-------- 2 --------|------ 3 ------|
2       |-------- 1 --------|-- 2 --|--------- 3 ---------|
3       |-- 1 --|----------- 2 -----------|------ 3 ------|

it's possible that a given chunk contains part of the EOI signal plus extra data (case 1) it's possible the EOI signal is spread across three chunks (case 2) it's possible the chunk contains the whole EOI signal but also has extra data (case 3)

there are more scenarios, but the idea is that we don't know what "window" of the full data stream each chunk will contain, which is why the code I wrote above makes no assumption the contents of a single chunk, only the data stream as a whole. As long as whatever your solution does the same thing, it should work just fine.

rannn505 commented 4 years ago

Hey @dillonkrug, Correct me if I'm wrong, but since powershell executes synchronously, it is guaranteed that once we echoed EOI after all other commands, its output will be part of the last chunk... which means that the only scenario that causes the bug is where chunk is something like dataEOI.

BenjaminMichael commented 4 years ago

I dont think I've ever seen it like that. When invoking a series of commands I consistently get back dataEOIdata under certain conditions. Since Im always expecting a JSON object back I handle it by splitting the returned string on EOI and JSON.parse each substring.

rannn505 commented 4 years ago

Fix invoke() promise never resolves bug

daverickdunn commented 4 years ago

I believe I've been experiencing this quite regularly. I thought it was networking related, but this makes way more sense. I'm fairly certain I can reproduce, will confirm if I can.

daverickdunn commented 4 years ago

The script below should generate the bug in ~10 seconds. It will log some other errors because I'm running it so frequently, but you can ignore them. It'll hang once it's encountered the issue.

It works by generating async events, that may or may not get added to the EOI chunk as @dillonkrug describes above. (Edit: This is not actually the issue. But async events like this appear to be required to generate it. See below)

If you extract the PS from the script and throw it into a standalone .ps1 you'll get a better picture of what it's doing.

In my case I am probably getting occasional warnings or errors from Microsoft Online servers and suppressing them. We have discovered one scenario where a warning is written to the console but it cannot be suppressed or redirected, no matter what we try, so we have a rule to check before parsing on the Node side. I now believe it is generated from an event, much like this example does.

const shell = require('node-powershell');

let ps = new shell({
    executionPolicy: 'Bypass',
    noProfile: true
});

ps.on('output', out => {
    try {
        JSON.parse(out);
        console.log('[INFO] successfully paresed JSON')
    } catch{
        console.error('[ERROR] could not parse JSON');
        console.log(out)
    }
});

async function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

async function runScript() {

    let count = 0;

    async function run(script) {
        console.log(`[+] run #${++count}`)
        console.log('[+] executing script', script)
        return ps.addCommand(
            `
            $timer = New-Object System.Timers.Timer;
            $event = Register-ObjectEvent -InputObject $timer -EventName Elapsed -Action { Write-Output "Event!" };
            $timer.Interval = 10;
            $timer.Enabled = $true;
            Get-Date | ConvertTo-Json;
            `
        )
            .then(() => ps.invoke())
            .finally(() => {
                console.log('[-] completed script', script)
            })
    }

    while (true) {
        try {
            await run(__dirname + '/scripts/random.ps1');
            await sleep(75);
        } catch (e) {
            console.log(e)
        }
    }
}

runScript()
    .then(() => console.log('[bug] tests complete'))
    .finally(() => process.exit(0))

I tested @dillonkrug's solution and unfortunately it did not work for me.

I thought I had a one-line (ish) solution by writing the EOI output in the following way, where the idea was to force the stream to write out, but this didn't work in the end (though it may have mitigated the issue somewhat).

let command = `
$enc = [system.Text.Encoding]::UTF8;
$data = $enc.GetBytes("${EOI}");
$std_out = [console]::OpenStandardOutput();
$std_out.Write($data, 0, $data.length);`;
shellSafeWrite(this.streams.stdin, command)

So I dug a little deeper and inspected the chunks as they are read, this revealed that a very different issue is actually at hand. The EOI string from a previous call is getting emitted in the current call.

By adding the following lines to the very start of ShellStreamBuffer._write:

console.log('\nchunk:', chunk.toString('utf8').replace(/\0/g, ''))
console.log('EOI  :', this.EOI.toString('utf8').replace(/\0/g, ''))

We're able to see that run 30 works as expected, but run 31 is somehow terminated with the previous runs EOL, and thus never resolves. Note the timestamps, the actual data Get-Date is correct, it's just the EOL code that's from the previous run.

[#30] run started

chunk: {
    "value":  "\/Date(1584118816179)\/",
    "DisplayHint":  2,
    "DateTime":  "13 March 2020 17:00:16"
}

EOI  : EOI_nTnCms94

chunk: EOI_nTnCms94
EOI  : EOI_nTnCms94

[#30] run complete
[#31] run started

chunk: {
    "value":  "\/Date(1584118817038)\/",
    "DisplayHint":  2,
    "DateTime":  "13 March 2020 17:00:17"
}

EOI  : EOI_oZrF10ml0

chunk: EOI_nTnCms94
EOI  : EOI_oZrF10ml0

I've been debugging this for a few hours, I've stripped down the code to the bare minimum expecting to find an async / promise issue, but I'm at a loss at this point. I suspect it's an EventEmitter doing something out of sync, but really not sure. Will look again Monday

daverickdunn commented 4 years ago

I've found that by adding Start-Sleep to the end of the invoke chain the issue is mitigated significantly, but it's still not completely resolved.

If I add Start-Sleep -s 1 I go from failing every ~30 iterations to every ~600 iterations (avg. of 3 runs). Start-Sleep -s 2 got to ~900 (running again as I write, but at this rate it takes 30 minutes to do that many runs, so I'll update if that figure changes drastically).

As I boil this down, I feel it's actually nothing to do with stdout, but instead it's stdin that's causing issues. I'm only speculating now as I don't know enough about PowerShell, IPC or pipes in general, to speak with any authority, but I think the treads generated by System.Timers.Timer may be contending for the stdin stream, causing commands to be streamed into the process out of order. A mutex may be required around the script block that is sent by invoke.

daverickdunn commented 4 years ago

This person was experiencing the same issue and had a very similar workaround. The only thing he/she didn't uncover was the EOI codes returning out of order: #81

blueleader07 commented 3 years ago

Also hitting this ... and have a work-around (not great) until the fix is released.

For context: Initially, I only wrote a method called "runCommand" which calls invoke().

export const runCommand = async (ps: any, command: string, silent?: boolean) => {
    silent = silent === undefined ? true : silent
    try {
        ps.addCommand(command)
        const output = await ps.invoke()
        if (!silent && output) {
            console.log(output)
        }
        return output
    } catch (error) {
        console.error(error)
        await dispose(ps)
        throw new Error('An error occurred.')
    }
}

When I hit this issue I thought ... hmm can I just retry invoke()?
So I added "runCommandWithTimeout" and "retryableCommand"


export const runCommandWithTimeout = async (ps: any, command: string, silent?: boolean) : Promise<string> => {
    return new Promise((resolve, reject) => {
        const timeout = setTimeout(() => {
            reject(new CustomError('timeout after 15 seconds'))
        }, 15000)
        runCommand(ps, command, silent).then((output: string) => {
            clearTimeout(timeout)
            resolve(output)
        }).catch(error => {
            clearTimeout(timeout)
            reject(error)
        })
    })
}

export const retryableCommand = async (ps: any, command: string, silent?: boolean) => {
    let retries = 0
    while (retries < 3) {
        try {
            if (retries > 0) {
                console.log(`retry attempt ${retries} ... `)
            }
            const output = await runCommandWithTimeout(ps, command, silent)
            return output
        } catch (error) {
            console.error(error)
            retries += 1
        }
    }
    throw new Error(`retryableCommand failed after 3 attempts: ${command}`)
}

The output that came back on the retry included both outputs (included the one from the command that failed).

{
    "hello": "world"
}

{
    "hello":"world"
}

So then I had to do something super hacky and remove the extra JSON object before I parse it.

export const removeDuplicates = (output: string) => {
    let result
    const indexes = []
    const regex = /}[^,]/gm
    while ((result = regex.exec(output))) {
        indexes.push(result.index)
    }
    if (indexes.length > 0) {
        return output.substring(0, indexes[0] + 1)
    }
    return output
}

The resulting json looks like this after the duplicate is removed:

{
    "hello":"world"
}

Then JSON.parse works! Note, my "removeDuplicates" method is not perfect. It will not handle nested JSON objects. You will need to make the script smarter if parsing nested JSON objects. In my case I don't have any nested JSON objects.

Just thought I would share this super hacky workaround incase anyone else hit this 2 weeks before they go live like me. :)

Definitely looking forward to the fix, but understand it's open source and you have other commitments. Thanks for all your work.

daverickdunn commented 3 years ago

@blueleader07 I just wrote a new package in the end. It's been running anywhere between 100k - 1m commands a day in prod for over a year without a single lockup or error. It also separates out the 6 different PowerShell streams for you, which was a must for me as PS can spit out some crazy shit sometimes, especially on remote calls. https://www.npmjs.com/package/full-powershell

blueleader07 commented 3 years ago

@daverickdunn wow, that's great news. I will take a look. Thanks for sharing!