mozilla / gcp-ingestion

Documentation and implementation of telemetry ingestion on Google Cloud Platform
https://mozilla.github.io/gcp-ingestion/
Mozilla Public License 2.0
75 stars 31 forks source link

Update Beam SDK to 2.51 #2518

Closed akkomar closed 9 months ago

akkomar commented 9 months ago

Fixes https://github.com/mozilla/gcp-ingestion/issues/2503

@whd I went through the code and looks like changes to PubsubMessage in the SDK are related to new features we don't use so this PR seems to be low risk.

whd commented 9 months ago

Interestingly, this failed to deploy to stage with:

{"severity":"INFO","time":"2023/11/27 18:48:14.572273","line":"exec.go:66","message":"Exception in thread \"main\" "}
{"severity":"INFO","time":"2023/11/27 18:48:14.573700","line":"exec.go:66","message":"java.lang.RuntimeException: Encountered checked exception when constructing an instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.573927","line":"exec.go:66","message":"\tat org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574020","line":"exec.go:66","message":"\tat org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:158)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574119","line":"exec.go:66","message":"\tat org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574210","line":"exec.go:66","message":"\tat org.apache.beam.sdk.Pipeline.create(Pipeline.java:153)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574290","line":"exec.go:66","message":"\tat com.mozilla.telemetry.Decoder.run(Decoder.java:56)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574377","line":"exec.go:66","message":"\tat com.mozilla.telemetry.Decoder.run(Decoder.java:49)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574471","line":"exec.go:66","message":"\tat com.mozilla.telemetry.Decoder.main(Decoder.java:37)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574685","line":"exec.go:66","message":"Caused by: java.lang.NoClassDefFoundError: com/google/storage/v2/StorageProto"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574775","line":"exec.go:66","message":"\tat com.google.cloud.hadoop.gcsio.GoogleCloudStorageOptions.\u003cclinit\u003e(GoogleCloudStorageOptions.java:55)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574882","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.util.GcsUtil.\u003cinit\u003e(GcsUtil.java:220)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.574969","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:126)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.575052","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.util.GcsUtil$GcsUtilFactory.create(GcsUtil.java:108)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.575148","line":"exec.go:66","message":"\tat org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:689)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.575642","line":"exec.go:66","message":"\tat org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:630)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.575750","line":"exec.go:66","message":"\tat org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:227)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.575863","line":"exec.go:66","message":"\tat com.sun.proxy.$Proxy22.getGcsUtil(Unknown Source)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.575962","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.verifyPathIsAccessible(GcsPathValidator.java:83)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576050","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.storage.GcsPathValidator.validateOutputFilePrefixSupported(GcsPathValidator.java:53)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576135","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:373)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576232","line":"exec.go:66","message":"\tat org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:354)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576330","line":"exec.go:66","message":"\tat org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:689)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576433","line":"exec.go:66","message":"\tat org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:630)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576515","line":"exec.go:66","message":"\tat org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:227)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576733","line":"exec.go:66","message":"\tat com.sun.proxy.$Proxy0.getGcpTempLocation(Unknown Source)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576855","line":"exec.go:66","message":"\tat org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:286)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.576971","line":"exec.go:66","message":"\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.577846","line":"exec.go:66","message":"\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.577962","line":"exec.go:66","message":"\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578059","line":"exec.go:66","message":"\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578145","line":"exec.go:66","message":"\tat org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:217)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578240","line":"exec.go:66","message":"\t... 6 more"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578462","line":"exec.go:66","message":"Caused by: java.lang.ClassNotFoundException: com.google.storage.v2.StorageProto"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578547","line":"exec.go:66","message":"\tat java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578650","line":"exec.go:66","message":"\tat java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578750","line":"exec.go:66","message":"\tat java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)"}
{"severity":"INFO","time":"2023/11/27 18:48:14.578860","line":"exec.go:66","message":"\t... 28 more"}
{"severity":"INFO","time":"2023/11/27 18:48:14.611258","line":"exec.go:52","message":"java failed with exit status 1"}
{"severity":"INFO","time":"2023/11/27 18:48:14.611308","line":"launch.go:77","message":"Template launch failed: exit status 1"}
{"severity":"INFO","time":"2023/11/27 18:48:14.611327","line":"launch.go:99","message":"Uploading console logs to gcs location: gs://dataflow-staging-us-west1-151430467146/staging/template_launches/2023-11-27_10_47_06-4091342686544121193/console_logs"}
Caused by: java.lang.NoClassDefFoundError: com/google/storage/v2/StorageProto

I will investigate this more closely to see if it's a template build issue or something.

akkomar commented 9 months ago

I can reproduce this starting the runner locally so it doesn't seem like a template's fault. I tried couple of different SDK versions and found that the job starts failing on 2.44.0. Looks like something changed between 2.43.0 and 2.44.0 that's causing this failure.