mattcasters / kettle-beam-examples

Example transformations for the Kettle Beam project
Apache License 2.0
6 stars 7 forks source link

Pipeline Submission Error: Size Exceeded #1

Open twknight opened 5 years ago

twknight commented 5 years ago

When submitting the input-process-output.ktr transformation for Dataflow processing, I get the following error. I can't find any way to see what's actually being sent in order to take any additional troubleshooting steps. My GS bucket name is only 12 characters, so I don't think that's contributing to the issue.

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "(5d2feec528391b35): The workflow could not be created. Causes: (5d2feec528391c82): SDK pipeline options or staging file list exceeds size limit. Please keep their length under 256K Bytes each and 512K Bytes in total. For example, shorten pipeline options or staging file paths, or reduce the number of staging files by building a Java uber JAR.", "reason" : "badRequest" } ], "message" : "(5d2feec528391b35): The workflow could not be created. Causes: (5d2feec528391c82): SDK pipeline options or staging file list exceeds size limit. Please keep their length under 256K Bytes each and 512K Bytes in total. For example, shorten pipeline options or staging file paths, or reduce the number of staging files by building a Java uber JAR.", "status" : "INVALID_ARGUMENT" }

mattcasters commented 5 years ago

I don't see the same issue. Did you add extra plugins to the classpath in the Beam Job Config?

Ah, hang on... if we look at the classpath we see something like...

/home/matt/tmp/di-beam/plugins/kettle-json-plugin/kettle-json-plugin-core-8.2.0.3-519.jar, /home/matt/tmp/di-beam/plugins/kettle-json-plugin/lib/json-path-2.1.0.jar, /home/matt/tmp/di-beam/plugins/kettle-json-plugin/lib/slf4j-api-1.7.7.jar, /home/matt/tmp/di-beam/plugins/kettle-json-plugin/lib/json-smart-2.2.jar, /home/matt/tmp/di-beam/plugins/kettle-beam/kettle-beam-1.0.0-SNAPSHOT.jar, /home/matt/tmp/di-beam/plugins/kettle-beam/lib/malhar-library-3.4.0.jar, /home/matt/tmp/di-beam/plugins/kettle-beam/lib/grpc-alts-1.17.1.jar, ....

So if you install Kettle Beam in a folder that is very deep we could get in reach of that 256kB limit. 577 jar files times an extra 100 characters perhaps? Could you verify these paths for me on your end?
I would hate to move to a fat jar system since the DataFlow system is so efficient in updating the jar files.

twknight commented 5 years ago

My path has an extra 24 characters per jar, and it was able to successfully copy all of the jars to the GS bucket. It appears the failure occurs after that, but as I mentioned before, I'm at a loss on the best way to inspect the details of what's actually causing the error. Thank you, by the way, for your efforts on this--I can see it being quite helpful for making new Dataflow jobs much more efficiently!

mattcasters commented 5 years ago

Just to clarify: jar file copying is supposed to happen automatically, no manual action is needed.

twknight commented 5 years ago

Correct, the process copied the files, not me. So, the first execution was "slow" while that happened, then the error occurred. Subsequent executions fail with the same error quickly since it's not copying the files.

mattcasters commented 5 years ago

From the error it seems it also could be another pipeline options error since the files list size isn't big enough to cause issues, not by far. Here's what I use... you would have to change the Project ID and staging location.

image

mattcasters commented 5 years ago

Also make sure you have sufficient rights on the project and the account you're using. For this to work I had to give every right and the kitchen sink.

twknight commented 5 years ago

Mine is very similar. I even removed all of the plugin folders to stage. It's certainly possible it's a permissions issue masquerading as this--I've been trying to take a more conservative approach to granting access to the service account. I'll start adding more rights to see if that gets me anywhere, thanks.

image

twknight commented 5 years ago

BTW, here's what's been granted to this point.

image

twknight commented 5 years ago

It was the depth of the path to PDI. I moved the installation to C:\Pentaho\PDI and it was able to submit successfully! Not totally sure I want to close the issue, however. I'll leave that up to you!

mattcasters commented 5 years ago

Ah, that's progress nonetheless! Thanks for the feedback. I guess it will be easier to just move to a single fat jar on all platforms. I'm reworking and refining a bunch of things for Beam Summit Berlin next week so I'll include this issue in the list of things to fix.

alberto-guerra-lmes commented 5 years ago

i have the same problem

mattcasters commented 5 years ago

You can generate a fat jar in the latest versions. Grab a download at www.kettle.be There is a build button in the Beam Config dialog or in the Beam menu in Spoon.