Native ES Module version

moll / node-mitm

Intercept and mock outgoing Node.js network TCP connections and HTTP requests for testing. Intercepts and gives you a Net.Socket, Http.IncomingMessage and Http.ServerResponse to test and respond with. Super useful when testing code that hits remote servers.

Other

640 stars 48 forks source link

Native ES Module version #76

Open gr2m opened 2 years ago

gr2m commented 2 years ago

Hi Andri 👋🏻

I'm one of the maintainers of nock, I learned about mitm while browsing through some old issues. I'm currently working on decomposing nock, with the goal to create a low-level @nock/intercept module. I've been updates on my progress here: https://github.com/nock/nock/discussions/2247

I'm very impressed with the code of mitm, and the outstanding support of all Node versions back to v0.10, wow! I really like how elegantly mitm is hooking into the Node.js http(s) request lifecycle!

Instead of just learning how mitm works I thought it might be useful to rewrite it entirely as a native ES Module, so here you go: https://github.com/gr2m/mitm-esm. I don't think it makes sense to send a pull request, I rather think it makes sense to have both versions co exist: mitm which supports all Node versions out there, and mitm-esm who want a native ES Module and don't need all the workarounds for no longer maintained Node versions.

I thought you might want to maybe mention the version in the README? Totally up to you, I just thought I'd mention it here in case anyone would find it useful.

moll commented 2 years ago

Hey,

Nice to meet you. Thanks for sharing you code and changes to Mitm.js! I'll go over the pages you linked to (and comments in https://www.twitch.tv/videos/1225179586) and see what you've written.

Rewriting Nock? Have you considered the idea of depending on Mitm.js directly for intercepting instead of creating a new thing? I think I saw you think of that, too, somewhere. Anything in Mitm.js that's making it functionally hard, apart from coding style preferences?

Briefly skimmed the beginning of https://www.twitch.tv/videos/1225179586. I'm glad you like the approach of Mitm.js. I'm quite proud of the backward compatibility, too. I actually do have older Node.js versions in production and find it very helpful to have one dependency less to worry that may be left behind. Not to mention when someone's migrating an older project with poorer test coverage to newer Node.js — they can just install the latest mitm, write tests, upgrade Node.js versions step by step without having to upgrade Mitm.js and retest. I think that's far better than having to upgrade dependencies in lock-step with Node.js versions.

Saw you remove Underscore.js — that's a good idea. I should do the same. No need for an external dependency for only a couple of functions.

Too bad about Travis CI, yeah. I hit a "migrate all projects" button on their site a while ago, but as far as it seems, it's gone. I did give GitHub Actions a try in https://github.com/moll/node-heaven-sqlite and was pleasantly surprised, so I should migrate Mitm.js's tests over, too.

As for newer JavaScript features, I'm generally not a fan. I think they make the language worse, obfuscating the simplicity of prototypical inheritance (class) or just misleading (const variables not being immutable). The import statement though seems fine, but apart from enabling a little more static analysis, not worth leaving behind users in my view.

Thanks and cheers

moll commented 2 years ago

If you want to have a more real-time chat, we can also do a [video]call.

gr2m commented 2 years ago

Rewriting Nock? Have you considered the idea of depending on Mitm.js directly for intercepting instead of creating a new thing

Definitely not rewriting Nock, it's a refactoring. I wish it was as simple as just dropping in Mitm.js and be done with it, but what I need to do first is to isolate the intercept logic from the rest of Nock's features, such as the Mocking API and the recorder. It's all very intertwined. That is the hard part of the refactor.

I'll definitely explore if we can use Mitm.js as Nock's low-level intercept module for Node. In the end we will probably end up with a combination of today's Mitm.js and what Nock does, in order to support all of Nock's use cases.

I will need some more time until I finish the refactoring to isolate the intercept logic, once that is done I'll try to replace the isolated module with Mitm.js and see what breaks, and if it can be fixed within Mitm.js while continuing to work the way it works today.

Anything in Mitm.js that's making it functionally hard, apart from coding style preferences?

One change I'd suggest is to turn Mitm.js into a Singleton API. Given the way it works, I don't think it makes sense to create instances, it suggests that they could work without interfering with each other. For example consider this

const mitm1 = new Mitm()
const mitm2 = new Mitm()

mitm1.enable()
mitm2.enable()

// do some things

mitm1.disable()

// this will never be called
mitm2.on("request", handle)

Instead I'd suggest to change the API to a Singleton API like this

Mitm.enable()
Mitm.disable()

I think it better reflects how Mitm.js works. What do you think?

As for newer JavaScript features, I'm generally not a fan. I think they make the language worse, obfuscating the simplicity of prototypical inheritance (class) or just misleading (const variables not being immutable). The import statement though seems fine, but apart from enabling a little more static analysis, not worth leaving behind users in my view.

I hear you, and I agree. The reason I did use the new language features is because that's where the ecosystem is going, and I cannot afford to go against it, not for a big independent Open Source project such as Nock, not if we want to onboard and retain more maintainers in the future and keep the overhead low.

As I said I think today's Mitm.js is fantastic as it is, it's worth to keep it the way it works, exactly for the use case you mentioned. For Nock, it's not sustainable to support older Node versions, so we might as well take advantage of that.

I could well imagine that we will end up with a low-level intercept module that is very similar to Mitm.js, so people can switch back and forth, but ours will be a native ES Module and only support currently maintained Node versions.

If you want to have a more real-time chat, we can also do a [video]call.

Thanks for offering, that'd be great!

Give me two more weeks to finish my current refactoring of Nock, I'll have more insights and feedback by then. I'm sure there are some use cases that work in nock and don't in Mitm.js, and I'd like to help you to land support for them, to increase compatibility between the two projects.

Are there any use cases that you are aware of that are supported my Nock but not Mitm.js today?

gr2m commented 2 years ago

I'm getting close with the refactoring of nock, but am not quite done yet. The last bit is all about the nock's recorder.

Recording of the whole request/response lifecycle without intercepting is a critical feature that I'll need for nock. I don't think that is a use case currently supported by MITM based on these issues

With my current intercept logic, I can do this:

const http = require('node:http')

const got = require('got')
const intercept = require('./modules/intercept-node-http/index.js')

let recordedResponseHeaders
let recordedResponseStatusCode
let recordedResponseBodyChunks
let recordedRequestBodyChunks

run()

async function run() {
  intercept((options, overridenRequest) => {
    if (!recordedResponseData) {
      console.log('\nRECORDING MOCK\n')
      const realRequest = overridenRequest.nockSendRealRequest()
      recordedRequestBodyChunks = Buffer.concat(overridenRequest.nockGetRequestBodyChunks())

      realRequest.on('response', response => {
        recordedResponseStatusCode = response.statusCode
        recordedResponseHeaders = response.headers

        const chunks = []
        response.on('data', chunk => {
          chunks.push(chunk.toString('base64'))
        })
        response.on('end', () => {
          recordedResponseBodyChunks = chunks
        })
      })

      return
    }

    console.log('\nREPLAYING MOCK\n')

    // intercept
    const response = new http.IncomingMessage(overridenRequest.socket)

    // (1) set response header data
    response.statusCode = recordedResponseStatusCode
    response.headers = recordedResponseHeaders

    overridenRequest.res = response
    response.req = overridenRequest

    overridenRequest.emit('response', response)

    // (2) set response body data
    for (const chunk of recordedResponseBodyChunks) {
      response.push(Buffer.from(chunk, 'base64'))
    }

    // (3) finish the response
    response.complete = true
    response.push(null)
  })

  const realResponse = await got.post('http://httpbin.org/post', {
    json: { test: 1 },
  })
  console.log(`real response`)
  console.log(realResponse.body.substr(0, 100) + '...')

  const mockedResponse = await got.post('http://httpbin.org/post', {
    json: { test: 1 },
  })
  console.log(`mocked response`)
  console.log(mockedResponse.body.substr(0, 100) + '...')
}

I think what would be needed is to make it possible to bypass a request in the mitm.on("request") callback, similar to what I'm currently doing.

But I can also see how it might be better to have two low-level modules, one to intercept requests, the other to proxy/record the request/response life cycle.

Any thoughts?

moll commented 2 years ago

I'll definitely explore if we can use Mitm.js as Nock's low-level intercept module for Node. In the end we will probably end up with a combination of today's Mitm.js and what Nock does, in order to support all of Nock's use cases. I could well imagine that we will end up with a low-level intercept module that is very similar to Mitm.js, so people can switch back and forth, but ours will be a native ES Module and only support currently maintained Node versions.

Well. I truly hope you end up using Mitm.js in favor of forking as I don't think duplicating work in the future is an effective use of our collectively limited time. You seem focused on the "ES modules" part. Are there any actual negatives with using CommonJS modules from ES Modules? Don't they work transparently?

One change I'd suggest is to turn Mitm.js into a Singleton API. Given the way it works, I don't think it makes sense to create instances, it suggests that they could work without interfering with each other. For example consider this

Fair suggestion, though technically speaking, it is already a singleton (class) by virtue of being expected to have a single instance at any time. I think you've been poisoned by peeking behind the scenes. :P In the README, I promote calling the main export of mitm as a function, which enables intercepting and returns something that can be listened on — much like a singleton factory function. The rest is really an implementation detail and not part of the public API.

There are a few benefits of having the singleton class available, too, though: permitting monkey patching its methods prior to calling enable and not having state held by the module. I'm not sure if it's been practically useful for anyone, but in theory one can slightly modify Mitm's behavior by creating an instance, patching it and then calling enable. Not having a non-global way to do so means someone would have to modify the module directly, and I find that dirty, along with having state in modules.

It then really boils down to whether Mitm.js should protect against double enabling (like your example) and whether you should be able to call disable on the module object in addition to the singleton instance. That would require keeping state in the module, though...

const mitm1 = new Mitm()

Haha, that's one issue I have with new ES — const variables that are mutable. Confusing as hell. Times I've been burned by reassignable bindings — zero. Times I've been burned by mutable values — many. I fully expect these days to be burned by assuming const means constant, which in all the code I've seen, rarely does. A very poor language addition and about as useful as C's const pointers that aren't pointers to constants.

I will need some more time until I finish the refactoring to isolate the intercept logic, once that is done I'll try to replace the isolated module with Mitm.js and see what breaks, and if it can be fixed within Mitm.js while continuing to work the way it works today.

I'm looking forward to hearing whether there any features one can't implement in Mitm.js as it stands today. I see you brought up recording (a.k.a proxying).

Recording of the whole request/response lifecycle without intercepting is a critical feature that I'll need for nock. I don't think that is a use case currently supported by MITM based on these issues

I feel like haven't had the time to properly address the issues/PRs you listed, but overall I think recording is something one should be able to implement on top of the Mitm.js primitives relatively easily. I have a feeling one or more people have already done it. After a bit of searching, I found https://github.com/moll/node-mitm/issues/51, which has links to both Andreas Lind's (@papandreou) example (https://github.com/moll/node-mitm/issues/51#issuecomment-415523211) and Ian Walker-Sperber's (@ianwsperber) example (https://github.com/moll/node-mitm/issues/51#issuecomment-435930916).

It's important to keep in mind that by the time Mitm.js gives you the server-side socket, you can no longer bypass anything. The client-side buffers have already been dumped to the server side. With that limitation comes the power to see and modify the entire request (every byte) as you see fit. If you want to see what the real server would reply, make a new network request and proxy the request you got from the socket Mitm.js gave you. At the byte level, this will work for every protocol, but if you're only interested in HTTP, you could get the HTTP request via Mitm.js and specifically proxy the HTTP request. This way you can optimize how you're storing and displaying the recorded data, too.

While I want to support the rgeneral ecording use-case, I don't think the entire solution needs or even can be part of Mitm.js. I think it's too use-case specific — someone may want to record HTTP and display the requests as JSON, someone else as test-cases or even generate code.

I'm reminded that @papandreou wants to get access to the original connect options in https://github.com/moll/node-mitm/issues/14. I'll get to it! :innocent:

Are there any use cases that you are aware of that are supported my Nock but not Mitm.js today?

I can't think of much. I think the architecture and approach Mitm.js took is far more composable, reusable and stable:

(Re)implementing very little affords stability.
Permitting everyone reuse their knowledge of Node.js networking and testing (no need for a custom testing DSL) leads to reusability.
Permitting everyone reuse their existing networking functions/libraries/frameworks in testing affords composability.

But hey, to each their own. ^_^ I like fluent asserting APIs and you prefer assert. :P Which reminds me, Must.js, which you met in Mitm.js's tests, could also be used without modifying prototypes by doing var demand = require("must"); demand(foo).be.true(). ;)

gr2m commented 2 years ago

You seem focused on the "ES modules" part. Are there any actual negatives with using CommonJS modules from ES Modules? Don't they work transparently?

CJS modules cannot import ES Modules. It's not a big problem for Mitm.js because is dependency-less, but nock and other of my projects are increasingly locked out of dependency updates. The eco system is definitely shifting to ESM.

It also has the benefit for a hard break with legacy Node versions ;) For nock, another factor is that we might like to make it compatible with browsers in future, once we made it more modular, so that we can swap the node-http intercept logic with a fetch/service-worker intercept, or similar. ESM is really nice when building universal JavaScript code that you want to run in Node, browsers, Deno, Cloudflare Workers, and whatever new JS environments the future will bring.

And lastly, I really enjoy working with the new module system.

Haha, that's one issue I have with new ES — const variables that are mutable

That's unrelated to ESM, it's just modern JS syntax that works with both module systems. And yes, it's confusing to wrap one's head around 🤷🏼

Recording of the whole request/response lifecycle without intercepting is a critical feature that I'll need for nock. I don't think that is a use case currently supported by MITM based on these issues

I feel like haven't had the time to properly address the issues/PRs you listed, but overall I think recording is something one should be able to implement on top of the Mitm.js primitives relatively easily.

I think I'll try to create a separate module, just for the fun of it. I like small, focused modules :) I would make sure that Mitm and the record module would work well side-by-side. You can still decide to add the logic directly into Mitm.js if you like, but I don't see the need for it right now.

someone may want to record HTTP and display the requests as JSON, someone else as test-cases or even generate code.

My current approach is to not be opinionated about the serialization. I'm emitting a record event with the native req and res instances, plus their respective request bodies as Buffer Chunk Arrays.

I created a module based on the code I shared above, I'd love to hear what you think of it. The implementation is really easy, thanks to your http.ClientRequest.prototype.onSocket monkeypatch trick. https://github.com/gr2m/http-recorder#readme

moll commented 2 years ago

CJS modules cannot import ES Modules.

Oh, I hadn't looked into that and it indeed seems to be the case. Well that's stupid. Why fragment ecosystems like that, specially when it would be technically possible to provide support. It's not like everyone can upgrade their entire projects all at once.

That's unrelated to ESM, it's just modern JS syntax that works with both module systems. And yes, it's confusing to wrap one's head around 🤷🏼

I was more going for it's an [objectively] bad feature because of the mutability aspect and shouldn't be used for actually mutable objects. ^_^

I created a module based on the code I shared above, I'd love to hear what you think of it. The implementation is really easy, thanks to your http.ClientRequest.prototype.onSocket monkeypatch trick. https://github.com/gr2m/http-recorder#readme

Ah, well, that's not bad. Simpler than I expected. I do worry that it may be too simple...

Did you test when writing in different encodings, too, per write call?
There's also OutgoingMessage.prototype.end (the parent of ClientRequest) that according to https://github.com/nodejs/node/blob/113133badbe3c2f40f61eac733006207e06b9cc2/lib/_http_outgoing.js#L833 seems to call an internal method directly (not going through write). Wouldn't that miss output?
Does piping one stream into OutgoingMessage work, too?
Related to missing output, I have a sense hooking into data immediately will cause some applications to miss parts of the content body. Those that don't start reading the response right after response and instead expect the stream to be paused for a while. Perhaps they're passing the response first to an async function, but by that time, the recording logic will have already unpaused the stream.
There also HTTPS requests, although I think the encryption happens transparently via a TLS connection, so that wouldn't require changes.

gr2m commented 2 years ago

Why fragment ecosystems like that, specially when it would be technically possible to provide support

You can build modules that support both module system, but it's a headache, and we have enough of that as maintainers already. The reason ESM cannot be imported into CJS modules natively (and synchronously) is because CJS is synchronous, while ESM as asynchronous.

There has been lots and lots and lots of discussion going on. I know some of the people leading these efforts well, and I have nothing but respect to take Node.js towards compatibility with the web module system. It does cause a lot of friction, but it was unavoidable.

There is a lot more to it then we could discuss here. Import and exports maps, especially conditional exports are very relevant to the projects I'm building. This is a great resource with more information about the new module system and its capablities: https://nodejs.org/api/packages.html

I do worry that it may be too simple

Thanks a lot for the list! I created a follow up issue to go through them one-by-one: https://github.com/gr2m/http-recorder/issues/3

moll commented 2 years ago

Actually, the argument you brought up applies to using ESM from CommonJS, not the reverse.

Are there any actual negatives with using CommonJS modules from ES Modules? Don't they work transparently?

Aside from maybe slightly more complicated erasure of unused functions, why not always provide CommonJS to maximise compatibility? We're all aware of our limited time. Migrating popular leaf dependencies to ESM is precisely the cause of unnecessary work.

Having said that, I'm not against ESM or change in general, but I am against forcing downstream users to upgrade. I can't think to imagine how much people's time we're going to be wasting asking them to upgrade to a new module system overnight just because they need to upgrade a dependency all of a sudden — for a security issue, perhaps. That problem applies to all backwards incompatible changes, of course, that some libraries like to do far too often. I wonder why.

Upgrading too often on the other hand is equally a security nightmare. That reminds me, I saw you install new NPMs quite liberally in your Twitch stream. Aren't you afraid of supply chain attacks? There have been a fair number of those lately and I'm confident people are only getting started. They certainly should. I think we both would make perfect targets — both have access to a couple of very popular modules (>1M installs/week) that will certainly get a fair number of installs before detection.

You can build modules that support both module system, but it's a headache, and we have enough of that as maintainers already. The reason ESM cannot be imported into CJS modules natively (and synchronously) is because CJS is synchronous, while ESM as asynchronous.

There has been lots and lots and lots of discussion going on. I know some of the people leading these efforts well, and I have nothing but respect to take Node.js towards compatibility with the web module system. It does cause a lot of friction, but it was unavoidable.

I noticed that ESM is async, though I disagree that fracturing was both unavoidable and impossible. Node.js is control of the JS VM. It can do as it wishes when calling functions, incl. pausing the caller, firing up an async process to load an ESM module and resuming execution. It may not be perfect, but I wager it'll get the job done 99% of the time.

gr2m commented 2 years ago

I feel we go in circles with the ESM discussion :) We both agree the transition has a lot of friction. I think it's worth it because of the web platform compatibility. Without it Node will be left behind and hold back the platform. If you only care about Node.js, you are fine to keep using CJS, there is no end in sight for supporting it.

In terms of upgrade friction, what I will do is release ESM versions as breaking changes, but keep accepting contributions to the latest CJS version, too. We will backport any security-relevant patches ourselves, but these are rare. It's very easy to support maintenance versions using semantic-release, and we can leave maintenance of the CJS versions to the users. From my experiences, it's mostly companies who are stuck with older versions, and they should shoulder their long-term support, at least in independent open source projects.

I saw you install new NPMs quite liberally in your Twitch stream. Aren't you afraid of supply chain attacks?

Of course I have supply chain attacks in mind. I don't mindlessly add dependencies, I just removed the Mitm.js dependencies in my fork :) The ones I add without a thorough review I already know or maintain myself. Unless it's for a quick demo.

gr2m commented 2 years ago

I created two more modules, that split out the logic of TCP/TLS and HTTP(S) mocking

I changed the APIs a bit and removed the need for socket.bypass() and use a Symbol instead of the .serverSocket property. It was a fun exercise to see how to split mitm into two modules.

I'm just exploring at this point, I might or might not use these in nock, I mostly just felt like hacking on something.

I wasn't sure about licensing & copyright. I made lots of edits but it's still mostly your work and I want to make sure that is clearly communicated. I left the license but added a copyright, not sure if that is right? I must admit I never gave licensing a lot of thought. You clearly did, I tried my best to respect that, please let me know if something is not right.

I'm a bit worried about nock users having problems with LAGPL, it's not listed on https://spdx.org/licenses/, did Mitm.js users bring that up a lot in the past, to the point where it became a maintenance burden? Do you know if there are efforts to get LAGPL added to npm/GitHub to be recognized as a valid license?

moll commented 2 years ago

Of course I have supply chain attacks in mind. I don't mindlessly add dependencies, I just removed the Mitm.js dependencies in my fork :)

Yeah, that was a good move. I'll definitely do that to Mitm.js-proper, too.

I wasn't sure about licensing & copyright. I made lots of edits but it's still mostly your work and I want to make sure that is clearly communicated. I left the license but added a copyright, not sure if that is right? I must admit I never gave licensing a lot of thought. You clearly did, I tried my best to respect that, please let me know if something is not right.

I'm a bit worried about nock users having problems with LAGPL, it's not listed on https://spdx.org/licenses/, did Mitm.js users bring that up a lot in the past, to the point where it became a maintenance burden? Do you know if there are efforts to get LAGPL added to npm/GitHub to be recognized as a valid license?

Oh, no burden whatsoever. It is a superset of AGPL, after all, just more permissive. Over the decade+ I've had OSS out there, only a handful of people have ever emailed, commented or inquired about licensing, and I don't remember it being an actual problem for most once I reiterated what each and every README of mine has — the summary (https://github.com/moll/node-mitm#license). There might've been one that had a company policy of too few licenses to begin with, but that's not an exclusive problem to the license I went with.

My license ethics are quite straightforward, I believe — the work I, we or anyone in the future does on Mitm.js must remain open source and available. That is, if someone fixes a bug or adds a feature inside the codebase, and runs the fix in production, they have to share it with others. That's it. They can keep their own app closed-source. I find this arrangement fairer for all OSS work and I'm a little disturbed so many go with BSD/MIT licenses that don't require sharing. I presume few think about the ethical signal a license sends, both to users and contributors. Yet at the same time people are complaining big-tech is appropriating their work...

As for getting it listed in the SPDX license list, I suppose that would be a good idea. I'm definitely not the only person using LAGPL. I remember someone emailing me a while back about shining more light on it and coordinating with some OSS-lawyer-blogger, but I don't recall where that led.

moll commented 2 years ago

For the record, pure AGPL itself is like GPL, but fixes the loophole by stating things run for online use still have to be shared. With GPL you could say you're not distributing code and therefore don't need to share, whereas if you did a desktop app, you'd need to share the entire codebase. LGPL is like GPL, but permits combining closed-source code with open-source code (like libraries). LAGPL is to AGPL, what LGPL is to GPL — permits combining closed source code with open source libraries, while still requiring you share your changes to the open source parts.

gr2m commented 2 years ago

That makes a ton of sense, thank you!

moll commented 2 years ago

If you stumble upon another, perhaps more popular license that conveys the same simple ethics, let me know!

moll commented 2 months ago

Seems SPDX supports LGPL-like exceptions with the WITH syntax. As nudged by https://github.com/moll/node-mitm/issues/80, Mitm.js now contains the SPDX-compatible WITH GPL-3.0-linking-exception suffix. Automated license scanning tools shouldn't now get confused about the AGPL part and properly group Mitm.js with other LGPLs.

@gr2m, any progress of the idea of using Mitm.js internally in Nock?

gr2m commented 2 months ago

@gr2m, any progress of the idea of using Mitm.js internally in Nock?

not really unfortunately, I had to put my big nock refactoring work on nock on hold. Others picked up the maintenance though and are working on native fetch support