yuwtennis / apache-beam-pipeline-apps

0 stars 0 forks source link

Consider tls option in go sdk #28

Closed yuwtennis closed 5 months ago

yuwtennis commented 1 year ago

Related to yuwtennis/apache-beam-pipeline-apps#27

Purpose

xxx

Design outline

xxx

yuwtennis commented 1 year ago
"Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.net.MalformedURLException: no protocol: 34.84.255.173:9200
    org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBundleFn$DoFnInvoker.invokeSetup(Unknown Source)
    org.apache.beam.sdk.transforms.reflect.DoFnInvokers.tryInvokeSetupFor(DoFnInvokers.java:53)
    org.apache.beam.fn.harness.FnApiDoFnRunner.<init>(FnApiDoFnRunner.java:493)
    org.apache.beam.fn.harness.FnApiDoFnRunner$Factory.createRunnerForPTransform(FnApiDoFnRunner.java:194)
    org.apache.beam.fn.harness.FnApiDoFnRunner$Factory.createRunnerForPTransform(FnApiDoFnRunner.java:163)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:298)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:252)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:252)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:252)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.createBundleProcessor(ProcessBundleHandler.java:825)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.lambda$processBundle$0(ProcessBundleHandler.java:502)
    org.apache.beam.fn.harness.control.ProcessBundleHandler$BundleProcessorCache.get(ProcessBundleHandler.java:936)
    org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:498)
    org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
    org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
    java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.MalformedURLException: no protocol: 34.84.255.173:9200
    java.base/java.net.URL.<init>(URL.java:645)
    java.base/java.net.URL.<init>(URL.java:541)
    java.base/java.net.URL.<init>(URL.java:488)
    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$ConnectionConfiguration.createClient(ElasticsearchIO.java:632)
    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.setup(ElasticsearchIO.java:2491)
"
yuwtennis commented 1 year ago
{
insertId: "1ndm5b7c82b"
labels: {4}
logName: "projects/elite-caster-125113/logs/dataflow.googleapis.com%2Fjob-message"
receiveTimestamp: "2023-06-18T14:05:46.064924768Z"
resource: {2}
severity: "WARNING"
textPayload: "A worker was unable to start up.  Error: Unable to pull container image due to error: image pull request failed with error: Error response from daemon: manifest for apache/beam_java11_sdk:2.42.0.dev not found: manifest unknown: manifest unknown. This is likely due to an invalid SDK container image URL. Please verify any provided SDK container image is valid and that Dataflow workers have permissions to pull image."
timestamp: "2023-06-18T14:05:44.975019145Z"
}
yuwtennis commented 1 year ago

go test not respecting input arguments

    "workerPools": [
      {
        "autoscalingSettings": {},
        "ipConfiguration": "WORKER_IP_UNSPECIFIED",
        "kind": "harness",
        "numWorkers": 1,
        "packages": [
          {
            "location": "gs://elite-caster-125113/dataflow/staging/go-1-1687095893161419014/worker",
            "name": "worker"
          },
          {
            "location": "gs://elite-caster-125113/dataflow/staging/go-1-1687095893161419014/xlang/beam-sdks-java-io-expansion-service-2.42.0-SNAPSHOT-i1tLZkyShkz3HnjCytSsV7DCB3U6CeIOe1TkMzekxDw.jar",
            "name": "beam-sdks-java-io-expansion-service-2.42.0-SNAPSHOT-i1tLZkyShkz3HnjCytSsV7DCB3U6CeIOe1TkMzekxDw.jar"
          }
        ],
        "sdkHarnessContainerImages": [
          {
            "containerImage": "apache/beam_go_sdk:2.42.0.dev"
          },
          {
            "containerImage": "apache/beam_java11_sdk:2.42.0.dev"
          }
        ],
        "workerHarnessContainerImage": "apache/beam_go_sdk:2.42.0.dev"
      }
    ],
    "workerZone": "asia-northeast1-b"
  },
  "name": "go-testelasticsearch_basicwrite-939",
  "projectId": "elite-caster-125113",
  "type": "JOB_TYPE_BATCH"

Options used.

go test test/integration/io/xlang/elasticsearch/*\
      --runner DataflowRunner\
      --project elite-caster-125113\
      --staging_location gs://elite-caster-125113/dataflow/staging\
      --zone asia-northeast1-b\
      --expansion_jar io:/home/ywatanabe/Development/repos/beam/sdks/java/io/expansion-service/build/libs/beam-sdks-java-io-expansion-service-2.42.0-SNAPSHOT.jar\
      --esNodeAddrs $ES_NODE_ADDRS\
      --indexName "my-index"\
      --mappingType "_doc"\
      --userName $ES_USER\
      --password $ES_PASSWORD\
      --keyStorePath /tmp/ca.p12\
      --keyStorePassword eMC8VUvkfKVEV5An\
      --sdk_harness_container_image_overrides=".*java.*,asia.gcr.io/elite-caster-125113/custom_df_image:dev.1687095835"\
      --environment_config=asia.gcr.io/elite-caster-125113/beam_go_sdk:2.42.0\
      -timeout 0

kafka example https://github.com/apache/beam/blob/v2.42.0/sdks/go/examples/kafka/taxi.go#L101

yuwtennis commented 1 year ago

Possibly wrong first parameter in override option

https://beam.apache.org/releases/pydoc/2.42.0/_modules/apache_beam/options/pipeline_options.html
https://cloud.google.com/knowledge/kb/multi-language-dataflow-pipelines-fails-to-pull-image-from-default-docker-repository-000004196

It takes 2 parameters where first value is regex to identify the container image to override and the second value is the replacement container image