scripting / Scripting-News

I'm starting to use GitHub for work on my blog. Why not? It's got good communication and collaboration tools. Why not hook it up to a blog?
121 stars 10 forks source link

Writing my own synchronous function in Node #170

Open scripting opened 4 years ago

scripting commented 4 years ago

TL;DR: If they can do it in the fs routines, it must be possible for me to do it in my own code, right??

In Frontier, you make a simple HTTP request this way:

htmltext = tcp.httpReadUrl ("http://scripting.com"/)

I want to do the same thing in Node.

No callback. Just return the contents of the page. If I want to catch the error I can put it inside try/catch.

I have tried reading the source for the fs sync routines. Eventually it will click and I'll see how to do this. If they can do it, presumably so can I. But I thought I'd ask if people can show me how to do this sooner.

The answer should be in pseudocode. No need to go into the details of making an actual HTTP request. And just use basic JavaScript, I don't want promises or async/await. I want it to be as simple as shown above. Thanks!

PS: The reason is I want to have Frontier-like scripting for Node-implemented websites. I want to use JavaScript syntax, not bash or Python. I like JS. ;-)

danmactough commented 4 years ago

If they can do it in the fs routines, it must be possible for me to do it in my own code, right??

Well, you can but Node core does not enable it as easily for HTTP calls as it does for fs calls. HTTP calls are not implemented synchronously at all in the Node internals; fs calls are. I imagine (don't know) that it was a deliberate choice to avoid giving people an easy way to build slow web applications.

There are only 2 ways to get synchronous HTTP requests:

For the child process approach, check out this library https://github.com/ForbesLindesay/sync-request

var request = require('sync-request');
var res = request('GET', 'http://example.com');
console.log(res.getBody());

For the "build an http client" approach, there's https://github.com/dhruvbird/http-sync which is built using libcurl. This one looks pretty solid, but the interface looks pretty awful, although you could write a wrapper around it, I guess:

// example with default options
httpSync = require('http-sync');

var request = httpSync.request({
    method: 'GET',
    headers: {},
    body: '',

    protocol: 'http',
    host: '127.0.0.1',
    port: 80, //443 if protocol = https
    path: '/'
});

var timedout = false;
req.setTimeout(10, function() {
    console.log("Request Timedout!");
    timedout = true;
});
var response = request.end();

if (!timedout) {
    console.log(response);
    console.log(response.body.toString());
}
scripting commented 4 years ago

Dan, thanks for the explanation. I was figuring something like the "build an http client" might be the answer. And curl looks pretty good. I love that shell scripts can be synchronous and JS code can't.

I don't understand the "child process" approach. I looked at the source but I don't understand what he's doing. It's the kind of thing I'll have to stare at for a few hours over a few days to possibly understand.

Aside from his caveat about it not scaling, which I'm not sure matters (or is true) -- what's the downside of this approach?

scripting commented 4 years ago

I think the gold nugget in this thread so far is this:

If there's a CLI command for what you want to do, you can hook it up to Node synchronously.

curl is a great example. Another is AWS -- it has a command line interface. I've been using it a lot recently. It's fast, and simpler than the SDK interface. And of course it's synchronous.

danmactough commented 4 years ago

Aside from his caveat about it not scaling, which I'm not sure matters (or is true) -- what's the downside of this approach?

I think "not scaling" is the only downside -- other than concerns about it being confusing or not the Node Way™, which only matter if you care. 😄

The scaling concern is specific to whether your application is a web app using this code runs as part of responding to an incoming request. In that context, the scaling concern is very real: while your application is performing a synchronous http request, it cannot do any other work at all. This is true for any synchronous work in Node, but making another network request can be expected to take time establishing a connection, waiting for the other server to respond, waiting for the data to completely arrive. Hypothetically, if all that takes 500ms per request, then best-case scenario, you've halved the number of requests/sec your app can handle any time you hit this synchronous request code just once any second.

If that doesn't describe your app, then the scaling concern probably doesn't matter. CLI tools in particular are an excellent context to do things in a blocking manner -- you only have one user, doing one thing, so just write code that is straightforward to understand when you need to debug it or come back to it in 6 months! (This is a gross oversimplification. I've also written CLI tools that I want to perform lots of concurrent network requests. YMMV.)

curl is a great example. Another is AWS -- it has a command line interface. I've been using it a lot recently. It's fast, and simpler than the SDK interface. And of course it's synchronous.

100% git is another one. I've done the same with the Docker CLI, too.

Running CLI tools synchronously in a child process is totally fine (as long as you're not concerned about the scaling issue).

scripting commented 4 years ago

We’ve been having this same conversation over a number of years, I just want to nail down what happens if in response to an http request we do a synchronous io operation, and then another http request comes in. Does the second request wait for the first request to complete?

On Saturday, May 9, 2020, Dan MacTough notifications@github.com wrote:

Aside from his caveat about it not scaling, which I'm not sure matters (or is true) -- what's the downside of this approach?

I think "not scaling" is the only downside -- other than concerns about it being confusing or not the Node Way™, which only matter if you care. 😄

The scaling concern is specific to whether your application is a web app using this code runs as part of responding to an incoming request. In that context, the scaling concern is very real: while your application is performing a synchronous http request, it cannot do any other work at all. This is true for any synchronous work in Node, but making another network request can be expected to take time establishing a connection, waiting for the other server to respond, waiting for the data to completely arrive. Hypothetically, if all that takes 500ms per request, then best-case scenario, you've halved the number of requests/sec your app can handle any time you hit this synchronous request code just once any second.

If that doesn't describe your app, then the scaling concern probably doesn't matter. CLI tools in particular are an excellent context to do things in a blocking manner -- you only have one user, doing one thing, so just write code that is straightforward to understand when you need to debug it or come back to it in 6 months! (This is a gross oversimplification. I've also written CLI tools that I want to perform lots of concurrent network requests. YMMV.)

curl is a great example. Another is AWS -- it has a command line interface. I've been using it a lot recently. It's fast, and simpler than the SDK interface. And of course it's synchronous.

100% git is another one. I've done the same with the Docker CLI, too.

Running CLI tools synchronously in a child process is totally fine (as long as you're not concerned about the scaling issue).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scripting/Scripting-News/issues/170#issuecomment-626263032, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM32O747QABRT3N3XNZYUTRQYGNXANCNFSM4M42CRQQ .

--

danmactough commented 4 years ago

Does the second request wait for the first request to complete?

Yes.

scripting commented 4 years ago

Just want to be sure I understand -- using this to launch the web server:

http.createServer (handleRequest).listen (myPort);

It does not launch a separate thread for each request?

That's the first question.

Second:

It seems it would be possible to have a createServer function that does launch a new thread for each request. Would that break anything? I'm pretty sure my application-level code doesn't care if there's a concurrent thread running the request. I know how to do semaphores, and when to use them.

Caveat: I'm not necessarily going to write this code, I just want to understand where the limit is.

danmactough commented 4 years ago

It does not launch a separate thread for each request?

That's correct. It's a single thread, as far as your application code is concerned -- Node internals use threads, but your application code does not have access to those threads. All synchronous code in your application must run to completion (or enter asynchronous code) before another request can be handled.

It seems it would be possible to have a createServer function that does launch a new thread for each request. Would that break anything? I'm pretty sure my application-level code doesn't care if there's a concurrent thread running the request. I know how to do semaphores, and when to use them.

This is possible in newer versions of Node that have worker_threads. It would not break anything, but it is a pretty new API, so I'm not sure how well they've shaken the bugs out yet.

scripting commented 4 years ago

Thanks for the link to worker threads. I've been reading various docs, but have yet to find a Hello World for the subject. I always need that first. ;-)

On Sun, May 10, 2020 at 2:20 PM Dan MacTough notifications@github.com wrote:

It does not launch a separate thread for each request?

That's correct. It's a single thread, as far as your application code is concerned -- Node internals use threads, but your application code does not have access to those threads. All synchronous code in your application must run to completion (or enter asynchronous code) before another request can be handled.

It seems it would be possible to have a createServer function that does launch a new thread for each request. Would that break anything? I'm pretty sure my application-level code doesn't care if there's a concurrent thread running the request. I know how to do semaphores, and when to use them.

This is possible in newer versions of Node that have worker_threads https://nodejs.org/docs/latest-v12.x/api/worker_threads.html. It would not break anything, but it is a pretty new API, so I'm not sure how well they've shaken the bugs out yet.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/scripting/Scripting-News/issues/170#issuecomment-626368102, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM32O3W4PPIEAJENG7A5GTRQ3V5LANCNFSM4M42CRQQ .