ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
346 stars 175 forks source link

make lzo optional peer dependency #79

Closed jeffbski closed 4 years ago

jeffbski commented 5 years ago

Since lzo requires a native compilation step as part of its install, use codependency to make lzo an optional peer dependency and thus lzo compression (decode or encode) will not be available unless it is installed.

To enable lzo compression install the lzo package:

npm install lzo

This change returns this package to being a pure JS package which does not require special compilation for a target machine.

vweevers commented 5 years ago

Big downside of this approach is that build errors of optionalDependencies (if you dó want lzo) are silenced. That's especially painful in CI/CD (if it fails, you'll never know).

And you're mixing up optionalDependencies and peerDependencies. The former does not need a separate npm install lzo command, the latter does. If you meant to use peerDependencies, that has a major downside too: it reintroduces dependency hell into an otherwise fabulously designed package manager.

This change returns this package to being a pure JS package which does not require special compilation for a target machine.

What is your target environment?

jeffbski commented 5 years ago

The main goal of this PR is to allow this package to be used in a pure JS fashion (like it's description says) so that no compilation is necessary. LZO is currently the only binary dependency, so it would be nice to have a way to carve that out so it isn't required.

Here is my exact use case. I have serverless lambda's performing parquet transformation with Node.js. We are planning to use snappy compression. After I npm install parquetjs I am able to run all my code locally. Next when I want to deploy to AWS, serverless packages up the prod dependencies into a zip and uploads to AWS, so when I invoke them in the cloud everything is available that is needed. Locally I am running on Mac and others are on windows. In AWS lambda it will be running on Linux. Thus when the mac (or windows) node.js extension binaries for lzo are includes as part of that module in the AWS zip, it fails to load since the package has an incompatible binary.

I have worked around this issue by using a docker instance to npm rebuild the node.js extensions before zipping up and deploying. It takes extra time to run this and requires all developers on the team to have a local docker to be installed locally as well.

In contrast if I have pure JS packages then I don't have to worry about doing a npm rebuild on the target machine (or docker instance) simply to get binaries for the right target. With pure JS then everything that is local can be packaged as is up as an artifact and it will run anywhere we have Node.js running.

Pure JS packages eliminate the need to have a fully functioning build chain and local docker, so the bar to entry it much lower. Windows users in locked down corporate settings can appreciate this.

So that brings me to the idea that if this could be a pure JS package, I could eliminate all of this trouble and complexity. Pure JS code is much simpler to deal with. Making the lzo functionality a separately installed dependency or a plugin seems like a nice way to go.

Thus to that end, it seems to me that something like optionalPeerDependencies (as implemented by the codependency module) could be a nice way to support this mode of operation.

My understanding of optionalDependencies is that it would skip them if it could not compile during install. However it wasn't clear to me that this would also work for my situation where the package is installed but has the wrong binaries (and thus won't load). I could certainly try that and see if it would work.

However it would be ideal if the package was simply not there when it is not being used rather than being there and having bad binaries.

Current test suite does not check for lzo functionality other than that all dependencies install.

If we want to verify that lzo actuallly builds and works then we could have a test directory with its own package.json that installs lzo and runs tests with parquetjs using lzo verifying that everything works when both pieces are in place.

jeffbski commented 5 years ago

I did try the optionalDependency variant and it could work, but has its own disadvantages:

If that's the way we need to proceed I can change the PR to do this instead, but I'd love to not have this dependency installed at all if it isn't needed/wanted. That being said, it is a simpler PR if we can live with the indirection.

vweevers commented 5 years ago

Windows users especially appreciate this.

I'm a Windows user, and no, I don't 😉 Nowadays, if you don't already have a working setup, then npm install --global windows-build-tools is all you need. IMO the argument that building on Windows is so painful that we should avoid native addons altogether, does not fly (anymore, if it ever did).

AWS Lambda is a special type of deployment target. Some extra work is expected and fair. Tooling can make it easier. Again, it's not enough reason (for me personally) to forego native addons altogether.

I did try the optionalDependency variant and it could work, but has its own disadvantages

Ah, my bad. I didn't notice the word "peer" in optionalPeerDependencies. That clears up my confusion. Given that optionalPeerDependencies is not an npm feature however, I'm not a fan. Dependencies must have well-known semantics. I think both peerDependencies and optionalDependencies (and any combination of those mechanisms) have too many downsides, that outweigh having a native addon as dependency.

A dependency injection approach might work - where parquetjs takes an object with deflate/inflate functions - because then the only "peer dependency" would be the data structure, which never changes.

jeffbski commented 5 years ago

I also created the alternate approach making lzo an optional dependency hidden behind a function. PR #80 It eliminates the need of codependency and its magic requiring.

It leaves the broken dependency in place when uploading for a different target environment (serverless) but I can live with that.

See whether you think that approach is acceptable or whether you have a better idea.

vweevers commented 5 years ago

To be clear, I'm not a maintainer. Would like to hear their opinion at this point.

jeffbski commented 5 years ago

I'm glad your windows experience has been good. I wasn't aware of windows-build-tools so I'm sure that helps. However I have worked in many corporate environments that are heavily locked down with scanners that prevent tools and binaries from being loaded or installed, so for some folks the pain is real.

kessler commented 4 years ago

I can close this now right?

jeffbski commented 4 years ago

yes, since #80 went in we can close this. Thanks!