Closed rgbkrk closed 8 years ago
I can tab complete while the node console is running, so maybe it would be worth it for us to check that out.
> interval.from cell 2
from cell 1
from cell 2
from cell 1
interval.__defineGetter__ interval.__defineSetter__ interval.__lookupGetter__ interval.__lookupSetter__ interval.__proto__ interval.constructor interval.hasOwnProperty interval.isPrototypeOf
interval.propertyIsEnumerable interval.toLocaleString interval.toString interval.valueOf
interval.close interval.ref interval.unref
interval._called interval._idleNext interval._idlePrev interval._idleStart interval._idleTimeout interval._onTimeout interval._repeat interval.domain
> interval.from cell 2
from cell 1
interval.__defineGetter__ interval.__defineSetter__ interval.__lookupGetter__ interval.__lookupSetter__ interval.__proto__ interval.constructor interval.hasOwnProperty interval.isPrototypeOf
interval.propertyIsEnumerable interval.toLocaleString interval.toString interval.valueOf
interval.close interval.ref interval.unref
interval._called interval._idleNext interval._idlePrev interval._idleStart interval._idleTimeout interval._onTimeout interval._repeat interval.domain
> interval.from cell 2
from cell 1
from cell 2
from cell 1
from cell 2
Note to myself: Last time we discussed this problem, Kyle mentioned we could use msg_id
.
Oh ha, I didn't realize I already said that.
Yeah, basically (and without my knowledge of the code base here), I'd assume for an async kernel like this, every run has its own context for what process.stdout
and process.stderr
is (mapping to particular parent_header.msg_id
).
I'll certainly dive back into this as I dive deep on getting this integrated as a first class kernel in nteract.
On 25/01/16 02:11, Kyle Kelley wrote:
Oh ha, I didn't realize I already said that.
Did I misunderstand? Doesn't the frontend use parent_header.msg_id
to determine what cell a stream.stdout
message belongs to?
Yeah, basically (and without my knowledge of the code base here), I'd assume for an async kernel like this, every run has its own context for what |process.stdout| and |process.stderr| is (mapping to particular |parent_header.msg_id|.
I already realised that. My first, naive thoughts were to run each cell within a context that redefines process.stdout
. This wouldn't work for any functions defined outside the cell context (e.g. console.log
).
Let me write down my current thoughts here:
stream.stdout
, stream.stderr
and execute_result
.process.stdout
and process.stderr
onto stream.stdout
and stream.stderr
. IJavascript imposes no restrictions on the use of process.stdout
and process.stderr
.execute_result
in a number of ways:
execute_result
only after the execution of the cell code has completed.$$done$$()
(in the future I may introduce $$update$$()
to allow multiple execute_result
messages).$$done$$()
is essential, because IJavascript waits until $$done$$()
is invoked before executing the code in the next cell.All the above points make me think that the solution is not the creation of multiple cell contexts. Instead, I think IJavascript needs to ensure that process.stdout
and process.stderr
have been flushed onto stream.stdout
and stream.stderr
before executing the code in the next cell.
Whoa, just noticed I missed out on responding here
Did I misunderstand? Doesn't the frontend use
parent_header.msg_id
to determine what cell astream.stdout
message belongs to?
Yeah, the frontend relies on parent_header.msg_id
to determine what cell the display data, execute result, streams, and errors will end up in.
Some relevant details of the spec:
status: idle
message is used to indicate that the output from a given cell is 'complete', and handlers can be cleaned up.status: idle
, the current behavior is to associate it with the most recent cellSo general recommendations for kernels:
setInterval(console.log...)
should be considered detached from its cell if the cell's idle message isn't being held@minrk and @rgbkrk Thanks for all the info.
This info makes me think that if we make msg_id
available to the IJavascript session (e.g. as $$msg_id$$
), or alternatively, we make available a stream that pipes its data using the right msg_id
(e.g. as $$stdout$$
), we could have multiple long-running async functions associated with multiple cells. For example:
In[1]: (function() {
var currentStdout = $$stdout$$;
setInterval(
function() { currentStdout.write("1 second\n"); },
1000
);
})();
Out[1]: 1 second
1 second
[...]
In[2]: (function() {
var currentStdout = $$stdout$$;
setInterval(
function() { currentStdout.write("1 minute\n"); },
60000
);
})();
Out[2]: 1 minute
This is issue is fixed in NEL@0.4.0 (which I will be publishing soon after I clean up and secure a few bits and bobs).
See the async-stdio
branch, here and here.
In the end, I've wrapped console
, process.stdout
and process.stderr
to ensure they direct their output to the right cell. The following code should now work as expected:
In[1]:
(function(console) {
setInterval(
function() { console.log("1"); },
1000
);
})(console);
$$async$$ = true;
Out[1]:
1
1
1
[...]
In[2]:
(function(console) {
setInterval(
function() { console.log("2"); },
1000
);
})(console);
$$async$$ = true;
Out[2]:
2
2
2
[...]
That's awesome!
I've made a pre-release that fixes this issue. I will advertise the changes when I make the release.
Async writes still don't seem to be working for me. Is there something special I need to do, since the first half of http://n-riesco.github.io/ijavascript/doc/async.ipynb.html states that it should just work.
EDIT: Removed names from code and made a gist: https://gist.github.com/neil-s/abf76e50f7e8588ab7f7
I have a readStream from a csv file, being read by the csvtojson
package. The bulk of the work is done in an event handler provided by csvtojson
that is called with each record parsed. In there, I'm doing some console.log
s and console.error
s, but that output never shows up. I know that the code is running in the background because I keep running a different cell which just outputs the variable representing the number of records parsed so far, and that keeps incrementing correctly.
Using v 5.0.11-beta.0
@neil-s
I've opened issue #64 to update the documentation.
In the meantime, I'll try to describe the usage here.
By asynchronous code, I mean code that doesn't run immediately (i.e. code in callback functions). For example, this is asynchronous code:
function sayHello() {
console.log("Hello, World!");
}
setTimeout(sayHello, 1000);
whereas this is not:
function sayHello() {
console.log("Hello, World!");
}
sayHello();
When code run asynchronously and uses global.console
, process.stdout
or process.stderr
, IJavascript doesn't know which cell to send the output to.
A solution to this issue is that the asynchronous makes a copy of the global.console
, process.stdout
or process.stderr
associated with a cell. There are several ways to achieve this:
I don't recommend this way, because is prone to error (each cell launching asynchronous code would require its own copy of global.console
, process.stdout
or process.stderr
):
$console = console;
function sayHello() {
$console.log("Hello, World!");
}
setTimeout(sayHello, 1000);
This is a pattern often used in Javascript code:
var sayHello = (function(console) {
return function sayHello() {
console.log("Hello, World!");
};
})(console);
setTimeout(sayHello, 1000);
The trick in this pattern is that (function(console) { ... })(console);
creates a copy of console
that is only accessible within the function body.
A nice feature of this pattern is that the asynchronous code runs unmodified (unlike the pattern in the previous section, where the local variable had to be renamed $console
, the name of the local variable in this pattern can be console
).
let
(requires ES6)If the Node.js version in your system supports ES6, you can use let
to define a block-scoped local variable named console
:
{
let console = global.console;
function sayHello() {
console.log("Hello, World!");
}
setTimeout(sayHello, 1000);
}
Coming back to your code in https://gist.githubusercontent.com/neil-s/abf76e50f7e8588ab7f7/raw/37d5990944b505ee9dcbf8ddcabd051d60871e8a/issue.ipynb , I suggest the following change:
var insertedRecords = 0;
var insertRecords = (function(console) {
return function(db) {
fs.createReadStream("./some_file.csv").pipe(converter);
converter.on("record_parsed", function(jsonRecord) {
try {
mongoRecord = db.collection('some_collection')
.update({
"some_key": parsedRecord["some_key"]
},
parsedRecord, {
upsert: true
}
);
} catch (e) {
console.error("Error with record: " + jsonRecord.some_key);
console.error(e);
}
insertedRecords += 1;
if (insertedRecords % 100 === 0) {
console.log("Inserted " + insertedRecords);
}
});
converter.on("end_parsed", function(jsonObject) {
console.log("Finished parsing");
});
};
})(console);
try {
insertRecords(db);
} catch (e) {
console.error(e);
}
When a user is working with the notebook, if a user has background code that writes to stdout or stderr, the output should stay with the cell that invoked it.
The simplest example I could come up with is
in the first cell, followed by
in the second cell. Here's what the output looks like right now:
Another strangeness I noticed - if I start doing tab completion somewhere else in the code it will stop the
setInterval
s from occurring. This was surprising up until I realized that completions have to remain in scope (code in the background could have side effects).