yuwtennis / apache-beam-pipeline-apps

0 stars 0 forks source link

Implement Elasticsearch IO using TLS on dataflow #24

Closed yuwtennis closed 1 year ago

yuwtennis commented 1 year ago
[ywatanabe@laptop-archlinux java]$ java -cp build/libs/java-1.0-SNAPSHOT-all.jar app.examples.ElasticsearchIOSimpleWrite --runner=DataflowRunner --gcpTempLocation=gs://elite-caster-125113/tmp --project=elite-caster-125113 --region=asia-northeast1 --filesToStage=certs/ca.p12 
Mar 31, 2023 7:49:31 PM com.google.auth.oauth2.DefaultCredentialsProvider warnAboutProblematicCredentials
WARNING: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/.
[main] INFO org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory - No stagingLocation provided, falling back to gcpTempLocation
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
[main] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Uploading 1 files from PipelineOptions.filesToStage to staging location to prepare for execution.
[pool-4-thread-2] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Uploading certs/ca.p12 to gs://elite-caster-125113/tmp/staging/ca-hzEV_Oss_iYDkI3dLJ9KKQJM_JUkkEkcLaLdobydiXY.p12
[main] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Staging files complete: 0 files cached, 1 files newly uploaded in 1 seconds
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Staging portable pipeline proto to gs://elite-caster-125113/tmp/staging/
[pool-7-thread-1] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Uploading <26805 bytes, hash 1ce17525f37a8ac157b9c470cb0d34d820921543989947f4fcaf2037ada804f1> to gs://elite-caster-125113/tmp/staging/pipeline-HOF1JfN6isFXucRwyw002CCSFUOYmUf0_K8gN62oBPE.pb
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding ToPcollection/Read(CreateSource) as step s1
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding MapElements/Map as step s2
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding ToElasticsearch/ElasticsearchIO.DocToBulk/ParDo(DocToBulk) as step s3
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding ToElasticsearch/ElasticsearchIO.BulkIO/Window inputs globally/Window.Assign as step s4
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding ToElasticsearch/ElasticsearchIO.BulkIO/ParDo(BulkIOBundle) as step s5
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding ToElasticsearch/ElasticsearchIO.BulkIO/Restore original windows/Window.Assign as step s6
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding ToElasticsearch/ElasticsearchIO.BulkIO/ParMultiDo(ResultFiltering) as step s7
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Dataflow SDK version: 2.42.0
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobs/asia-northeast1/2023-03-31_03_49_37-16326573989423599964?project=elite-caster-125113
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Submitted job: 2023-03-31_03_49_37-16326573989423599964
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - To cancel the job using the 'gcloud' tool, run:
> gcloud dataflow jobs --project=elite-caster-125113 cancel --region=asia-northeast1 2023-03-31_03_49_37-16326573989423599964
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:39.686Z: Autoscaling is enabled for job 2023-03-31_03_49_37-16326573989423599964. The number of workers will be between 1 and 1000.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:39.756Z: Autoscaling was automatically enabled for job 2023-03-31_03_49_37-16326573989423599964.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:42.076Z: Worker configuration: n1-standard-1 in asia-northeast1-b.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.344Z: Expanding CoGroupByKey operations into optimizable parts.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.373Z: Expanding GroupByKey operations into optimizable parts.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.400Z: Lifting ValueCombiningMappingFns into MergeBucketsMappingFns
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.457Z: Fusing adjacent ParDo, Read, Write, and Flatten operations
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.484Z: Fusing consumer MapElements/Map into ToPcollection/Read(CreateSource)
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.510Z: Fusing consumer ToElasticsearch/ElasticsearchIO.DocToBulk/ParDo(DocToBulk) into MapElements/Map
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.537Z: Fusing consumer ToElasticsearch/ElasticsearchIO.BulkIO/Window inputs globally/Window.Assign into ToElasticsearch/ElasticsearchIO.DocToBulk/ParDo(DocToBulk)
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.558Z: Fusing consumer ToElasticsearch/ElasticsearchIO.BulkIO/ParDo(BulkIOBundle) into ToElasticsearch/ElasticsearchIO.BulkIO/Window inputs globally/Window.Assign
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.580Z: Fusing consumer ToElasticsearch/ElasticsearchIO.BulkIO/Restore original windows/Window.Assign into ToElasticsearch/ElasticsearchIO.BulkIO/ParDo(BulkIOBundle)
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.606Z: Fusing consumer ToElasticsearch/ElasticsearchIO.BulkIO/ParMultiDo(ResultFiltering) into ToElasticsearch/ElasticsearchIO.BulkIO/Restore original windows/Window.Assign
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.892Z: Executing operation ToPcollection/Read(CreateSource)+MapElements/Map+ToElasticsearch/ElasticsearchIO.DocToBulk/ParDo(DocToBulk)+ToElasticsearch/ElasticsearchIO.BulkIO/Window inputs globally/Window.Assign+ToElasticsearch/ElasticsearchIO.BulkIO/ParDo(BulkIOBundle)+ToElasticsearch/ElasticsearchIO.BulkIO/Restore original windows/Window.Assign+ToElasticsearch/ElasticsearchIO.BulkIO/ParMultiDo(ResultFiltering)
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:49:43.965Z: Starting 1 workers in asia-northeast1-b...
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:50:32.886Z: Autoscaling: Raised the number of workers to 1 based on the rate of progress in the currently running stage(s).
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T10:51:03.523Z: Workers have started successfully.
[main] ERROR org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:49:43.869Z: Workflow failed. Causes: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. Please check the worker logs in Stackdriver Logging. You can also get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:49:43.888Z: Cancel request is committed for workflow job: 2023-03-31_03_49_37-16326573989423599964.
[main] ERROR org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:49:43.901Z: The Dataflow job appears to be stuck because no worker activity has been seen in the last 1h. Please check the worker logs in Stackdriver Logging. You can also get help with Cloud Dataflow at https://cloud.google.com/dataflow/support.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:49:43.910Z: Finished operation ToPcollection/Read(CreateSource)+MapElements/Map+ToElasticsearch/ElasticsearchIO.DocToBulk/ParDo(DocToBulk)+ToElasticsearch/ElasticsearchIO.BulkIO/Window inputs globally/Window.Assign+ToElasticsearch/ElasticsearchIO.BulkIO/ParDo(BulkIOBundle)+ToElasticsearch/ElasticsearchIO.BulkIO/Restore original windows/Window.Assign+ToElasticsearch/ElasticsearchIO.BulkIO/ParMultiDo(ResultFiltering)
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:49:44.693Z: Cleaning up.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:49:44.735Z: Stopping worker pool...
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:50:26.665Z: Autoscaling: Resized worker pool from 1 to 0.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2023-03-31T11:50:26.690Z: Worker pool stopped.
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineJob - Job 2023-03-31_03_49_37-16326573989423599964 failed with status FAILED.
[ywatanabe@laptop-archlinux java]$ 
yuwtennis commented 1 year ago
Executing: java -Xmx2701401292 -Xlog:gc*:file=/var/log/dataflow/jvm-gc.log -XX:+AlwaysActAsServerClassMachine -XX:+UseParallelGC -XX:-OmitStackTraceInFastThrow -XX:-UseContainerSupport -cp /opt/google/dataflow/batch/libshuffle_v1.jar:/opt/google/dataflow/batch/dataflow-worker.jar:/opt/google/dataflow/slf4j/jcl_over_slf4j.jar:/opt/google/dataflow/slf4j/log4j_over_slf4j.jar:/opt/google/dataflow/slf4j/libto_slf4j.jar:/var/opt/google/dataflow/java-1.0-SNAPSHOT-all-3VIhfNtm5A7rQYeuApDfGQP9BBXdDRXBtV22xxY-tFc.jar -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.host=localhost -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.rmi.port=5555 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote=true -Ddataflow.worker.json.logging.location=/var/log/dataflow/dataflow.json.log -Ddataflow.worker.logging.filepath=/var/log/dataflow/dataflow-json.log -Ddataflow.worker.logging.location=/var/log/dataflow/dataflow.log -Djava.rmi.server.hostname=localhost -Djava.security.properties=/opt/google/dataflow/tls/disable_gcm.properties -Djob_id=2023-03-31_02_57_18-2684750866378012961 -Dsdk_pipeline_options_file=/var/opt/google/dataflow/pipeline_options.json -Dstatus_port=8081 -Dworker_id=elasticsearchiosimplewrit-03310257-rbnn-harness-s0x1 -Dworker_pool=elasticsearchiosimplewrit-03310257-rbnn-harness org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness
yuwtennis commented 1 year ago

Instruction for running custom container

https://cloud.google.com/dataflow/docs/guides/using-custom-containers#cloud-build