Open taureliloome opened 7 years ago
It should be possible to be smaller than 400 MB
Hi, can you check again, and run npm run clean
after npm install
? It should clean most of the stuff necessary to build but useless at running.
Any luck getting it to work? I am getting the following error when trying to run in lambda:
{
"errorMessage": "libboost_regex.so.1.62.0: cannot open shared object file: No such file or directory",
"errorType": "Error",
"stackTrace": [
"Object.Module._extensions..node (module.js:597:18)",
"Module.load (module.js:487:32)",
"tryModuleLoad (module.js:446:12)",
"Function.Module._load (module.js:438:3)",
"Module.require (module.js:497:17)",
"require (internal/module.js:20:19)",
"Object.<anonymous> (/var/task/node_modules/node-parquet/index.js:5:17)",
"Module._compile (module.js:570:32)",
"Object.Module._extensions..js (module.js:579:10)"
]
}
@alaister make a 'lib' folder on your lambda function and copy that library in there.. that worked for me
@aib-nick can you give me more information about your work with aws lambdas & the module?
I'm getting this error:
module initialization error: Error at Object.Module._extensions..node
i guess that it is the same @alaister error.
Thanks!
@fzaffarana this is my lambda application layout. As you can see I just made a lib directory and copied the missing library in there. I use AWS cloud9 for lambda development, so I got the library from there, and it works when deployed.
./myprogram
./myprogram/index.js
./lib
./lib/libboost_regex.so.1.53.0
./node_modules/node-parquet/...
... other modules installed with normal npm install ...
./template.yaml
./.application.json
and then I just include and use stuff normally
I have successfully made parquet files on s3 with this by putting a function inside a kinesis stream as a transformation function, and then throwing away all the transformations.. so the lambda functions writes to s3, and the kinesis stream does not. It almost worked, but I got a few errors where kinesis aborted, and I couldn't really debug what was going on... ultimately I had to abandon this method because of time constraints. But it was very close. I was able to read the resulting files from athena.
// setup AWS access
const setRegion = "us-east-1";
const AWS = require('aws-sdk');
AWS.config.update({region: setRegion});
// setup s3 access
const s3 = new AWS.S3();
// parquet access
const parquet = require('node-parquet');
....
exports.handler = (event, context, callback) => {
...
// schema for this parquet file
const schema = { ... };
.... loop through input and build up out_data[] ...
var tmpobj = tmp.fileSync();
var writer = new parquet.ParquetWriter(tmpobj.name, schema, 'snappy');
writer.write(out_data[k]);
writer.close();
... write to s3 ...
// give s3 the ability to read the local file and stream it
var rs = fs.createReadStream(tmpobj.name);
var s3_key = "parquet/stuff/year=" + moment(k).format('YYYY');
s3_key = s3_key + "/month=" + moment(k).format('MM');
s3_key = s3_key + "/day=" + moment(k).format('DD');
s3_key = s3_key + "/" + invocationId + ".snappy.parquet";
...
s3.putObject(s3_put_params, function(err, data) {
.. throw away records so kinesis doesn't write them after we wrote ok...
// this tells kinesis to throw away all the records we saved otherwise
output.push({
recordId: record.recordId,
result: 'Dropped'
});
...
callback(null, { records: output });
@aib-nick thank you first of all for the help.
I can see that we have similar lambdas (this is good). (i'm going to take your trick => 'give s3 the ability to read the local file and stream it').
But, i don't know if we have the same error.
this is mine (in aws console):
module initialization error: Error
at Object.Module._extensions..node (module.js:681:18)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Module.require (module.js:596:17)
at require (internal/module.js:11:18)
at Object.<anonymous> (/var/task/src/project/classes/node-parquet/index.js:5:17)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Module.require (module.js:596:17)
at require (internal/module.js:11:18)
at Object.<anonymous> (/var/task/src/project/classes/Tools.js:4:17)
at Module._compile (module.js:652:30)
It doesn't show any specific lib missing. On the other hand, when i test this lambda in my local environment, it works correctly.
This would be a useful feature!
Is there any fix?
@aib-nick Hi, Could you list the lib file? I'm running lambda to see every lib errors... :( Please help..~~
It's been a while since this question was originally asked, but I wanted to followup and see if anyone has a tried and true way of doing the npm install/adding lib files that always works to get node-parquet working on Lambda?
I'm about to embark on this task and would love to hear the wisdom of others as far as any gotchas.
I've managed to run node-parquet on AWS Lambda version NodeJS 10.x, I think it's worth mentioning that I couldn't build it on newer NodeJS versions. You'll also need Docker installed on your machine. The steps are the following:
Run this in the root folder of your project
$ docker run --rm -it -v "$PWD":/var/task lambci/lambda:build-nodejs10.x /bin/bash
This will give you an environment similar to the AWS Lambda.
Inside the container run the following commands:
# First we update the cmake version since this image comes with the version 2
cmake_name="cmake-3.16.1-Linux-x86_64"
cmake_tar="${cmake_name}.tar.gz"
curl -L https://github.com/Kitware/CMake/releases/download/v3.16.1/${cmake_tar} -o /opt/${cmake_tar}
mkdir -p /opt/${cmake_name}
tar xf /opt/${cmake_tar} -C /opt
chmod a+x /opt/${cmake_name}/bin/cmake
mv /bin/cmake /bin/cmake.bkp
ln -s /opt/${cmake_name}/bin/cmake /bin/cmake
# Now we install the last dependencies and build the project
yum install -y boost-devel bison flex
npm install
# Cleanup dependencies so we can actually deploy to AWS Lambda
rm -Rf ./node_modules/node-parquet/build_deps
I hope this helps!
I have done similar to what @paflopes describes and putting that into a layer which the application can use.
Hi, I wanted to use this wonderful module in aws lambda, the key blocker is that when I compile node-parquet module then the whole thing is over 400MB; Unfortunately AWS Lambda allows to upload ~240 MB max per lambda function. I was wondering is there any possibility to slim the whole output down. Or is this is what we get? In any case I'm looking through make files to understand if I can do something on my own. Thanks for your time!