kube-reporting / metering-operator

The Metering Operator is responsible for collecting metrics and other information about what's happening in a Kubernetes cluster, and providing a way to create reports on the collected data.
Apache License 2.0
339 stars 86 forks source link

presto can't insert data #1122

Open woodliu opened 4 years ago

woodliu commented 4 years ago

After install metering, I check the logs of reporting-operator,it shows error: io.prestosql.spi.PrestoException: Failed checking path: file:/user/hive/warehouse/metering.db/datasource_metering_persistentvolumeclaim_request_bytes Then i try a test. In Hive i create a table and insert data to the table ,it's successful. I check this table from Prosto, i can see it. But when i insert data in it,it return the same error like:

presto:metering> INSERT INTO kwang_test VALUES (1, 'San Francisco');
Query 20200306_063214_01963_k6i59 failed: Failed checking path: file:/user/hive/warehouse/metering.db/kwang_test

This is my configuration:

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
  name: "operator-metering"
spec:
  tls:
    enabled: false
  presto:
    spec:
      coordinator:
        resources:
          limits:
            cpu: 6
            memory: 6Gi
          requests:
            cpu: 4
            memory: 4Gi
  storage:
    type: "hive"
    hive:
      type: "sharedPVC"
      sharedPVC:
        claimName: "reporting-operator-pvc"
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: reporting-operator-pvc
  namespace: metering
spec:
  accessModes: ["ReadWriteMany"]
  resources:
    requests:
      storage: 95Gi
apiVersion: v1
kind: PersistentVolume
metadata:
  name: operator-metering-pv
  labels:
    name: operator-metering
spec:
  capacity:
    storage: 5Gi
  accessModes:
  - ReadWriteMany
  hostPath:
    path: "/mnt/metering/hive-metastore"
  persistentVolumeReclaimPolicy: Delete
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - node1

I use a local filesystem to create pvc as the Hive storage. Anyone help, thanks a lot !

timflannagan commented 4 years ago

I haven't played around with using a ReadWriteMany PVC for metastore storage in a while, but you may want to try updating your MeteringConfig custom resource and explicitly specifying the mount path for that volume (as the default is /user/hive/warehouse/metering.db):

hive:
  storage:
    type: "sharedPVC"
    sharedPVC:
      claimName: "reporting-operator-pvc"
      mountPath: "/mnt/metering/hive-metastore"

I can try re-creating your setup later and report back.

woodliu commented 4 years ago

I haven't played around with using a ReadWriteMany PVC for metastore storage in a while, but you may want to try updating your MeteringConfig custom resource and explicitly specifying the mount path for that volume (as the default is /user/hive/warehouse/metering.db):

hive:
  storage:
    type: "sharedPVC"
    sharedPVC:
      claimName: "reporting-operator-pvc"
      mountPath: "/mnt/metering/hive-metastore"

I can try re-creating your setup later and report back.

Thank you for replying. I have tried,but it still not work.... I install a presto use this version,it works when i use presto insert data to an exist table.

timflannagan commented 4 years ago

Hmm okay, you may need to uninstall and then reinstall again as changes made post-installation to any storage configuration typically don't propagate.

woodliu commented 4 years ago

Hmm okay, you may need to uninstall and then reinstall again as changes made post-installation to any storage configuration typically don't propagate.

Yeah, When uninstall, I allways clean all the data, include all pv and pvc, and the data in the pvs.

woodliu commented 4 years ago

I think the issue may be related with the presto..... I don't know the relationship between operator-framework/presto and prestosql/presto

timflannagan commented 4 years ago

The metering-operator pulls container images from that get built from the operator-framework/presto repository, which is just a fork of the upstream prestosql/presto repository. I believe the fork uses the 322 prestosql version.

woodliu commented 4 years ago

@timflannagan1 could you tell me how to build a new presto with new version of prestosql/presto? Thank you

timflannagan commented 4 years ago

I recent built and pushed a more up-to-date Presto image using the 328 version (330 is the most recent, released version) under the quay.io/tflannag/presto:release-328 tag, and you would need to override the metering-operator image to point to the quay.io/tflannag/origin-metering-ansible-operator:release-328 too:

export METERING_OPERATOR_IMAGE_REPO=quay.io/tflannag/origin-metering-ansible-operator
export METERING_OPERATOR_IMAGE_TAG=release-328

If you're interested in using that image, you would need to update the MeteringConfig custom resource file to specify that repository and tag, e.g.:

apiVersion: metering.openshift.io/v1
kind: MeteringConfig
...
spec:
...
  presto:
    spec:
      image:
        repository: quay.io/tflannag/presto
        tag: release-328

I haven't really played around with that Presto image yet and I had to remove one of the Presto plugin/connectors (presto-prometheus) to make it build properly, but I assume it should work out-of-the-box.

If that doesn't suffice, then in order to emulate this process, I had to pull down the 328 release tag (or whatever applicable release tag) from the upstream prestosql/presto repository. I then had to update the Dockerfile.okd, adding any of the new directories in the repository that were highlighted as errors when attempted to build this new image via docker build -f Dockerfile.okd <image>:<tag> ..

Here is the full diff of that Dockerfile (with same lazy workarounds like hardcoding the presto version instead of using the $PRESTO_VERSION environment variable):

tflannag@localhost presto [cherry-pick-hive-metastore-s3-fix]  git diff Dockerfile.okd
diff --git a/Dockerfile.okd b/Dockerfile.okd
index a35f9a4554..d63571804e 100644
--- a/Dockerfile.okd
+++ b/Dockerfile.okd
@@ -44,7 +44,6 @@ COPY presto-record-decoder /build/presto-record-decoder
 COPY presto-tpcds /build/presto-tpcds
 COPY presto-plugin-toolkit /build/presto-plugin-toolkit
 COPY presto-spi /build/presto-spi
-COPY presto-prometheus /build/presto-prometheus
 COPY presto-thrift-testing-server /build/presto-thrift-testing-server
 COPY presto-cli /build/presto-cli
 COPY presto-hive /build/presto-hive
@@ -75,7 +74,10 @@ COPY presto-kudu /build/presto-kudu
 COPY presto-main /build/presto-main
 COPY presto-raptor-legacy /build/presto-raptor-legacy
 COPY presto-password-authenticators /build/presto-password-authenticators
+COPY presto-memsql /build/presto-memsql
+COPY presto-testing /build/presto-testing
 COPY src /build/src
+COPY src/modernizer/violations.xml /build/src/modernized/violations.xml
 COPY pom.xml /build/pom.xml

 # build presto
@@ -103,7 +105,7 @@ RUN chmod +x /usr/bin/tini

 RUN mkdir -p /opt/presto

-ENV PRESTO_VERSION 322
+ENV PRESTO_VERSION 328
 ENV PRESTO_HOME /opt/presto/presto-server
 ENV PRESTO_CLI /opt/presto/presto-cli
 ENV PROMETHEUS_JMX_EXPORTER /opt/jmx_exporter/jmx_exporter.jar
@@ -113,8 +115,8 @@ ENV JAVA_HOME=/etc/alternatives/jre

 RUN mkdir -p $PRESTO_HOME

-COPY --from=build /build/presto-server/target/presto-server-$PRESTO_VERSION $PRESTO_HOME
-COPY --from=build /build/presto-cli/target/presto-cli-$PRESTO_VERSION-executable.jar $PRESTO_CLI
+COPY --from=build /build/presto-server/target/presto-server-328 $PRESTO_HOME
+COPY --from=build /build/presto-cli/target/presto-cli-328-executable.jar $PRESTO_CLI
 COPY --from=build /build/jmx_prometheus_javaagent.jar $PROMETHEUS_JMX_EXPORTER
tflannag@localhost presto [cherry-pick-hive-metastore-s3-fix]  

Here's a link to a local branch that highlights the changes I made: https://github.com/timflannagan1/presto/commit/5b6e1c296911751009e7ead6926531aca09fe171

woodliu commented 4 years ago

@timflannagan1 Hi, I have used the new image you pushed. But because there is no prometheus connector, it shows the error as below: 2020-03-09T02:08:00.879Z ERROR main io.prestosql.server.PrestoServer No factory for connector 'prometheus'. Available factories: [memory, kudu, blackhole, kinesis, redis, accumulo, gsheets, raptor-legacy, jmx, postgresql, elasticsearch, redshift, sqlserver, localfile, tpch, iceberg, mysql, mongodb, example-http, tpcds, phoenix, system, cassandra, kafka, atop, hive-hadoop2, presto-thrift] java.lang.IllegalArgumentException: No factory for connector 'prometheus'. Available factories: [memory, kudu, blackhole, kinesis, redis, accumulo, gsheets, raptor-legacy, jmx, postgresql, elasticsearch, redshift, sqlserver, localfile, tpch, iceberg, mysql, mongodb, example-http, tpcds, phoenix, system, cassandra, kafka, atop, hive-hadoop2, presto-thrift] at com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) at io.prestosql.connector.ConnectorManager.createCatalog(ConnectorManager.java:180) at io.prestosql.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:88) at io.prestosql.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68) at io.prestosql.server.PrestoServer.run(PrestoServer.java:129) at io.prestosql.$gen.Presto_328_106_g2c7b27a_dirty____20200309_020751_1.run(Unknown Source) at io.prestosql.server.PrestoServer.main(PrestoServer.java:72)

And I want to ask two question about building presto

Thank you!

timflannagan commented 4 years ago

You would also need to override the metering-operator image to use a custom one I also pushed that removed the presto-prometheus connector catalog from being loaded:

export METERING_OPERATOR_IMAGE_REPO=quay.io/tflannag/origin-metering-ansible-operator
export METERING_OPERATOR_IMAGE_TAG=release-328
./hack/openshift-install.sh

Like I said, I haven't tested that version yet and I don't know how much changes there are between 322 and 328 in terms of the hive connector catalog configuration. At a glance, it looks like there's some changes to the TLS-related properties which may break some things. If that's the case, you may need to disable TLS entirely in the MeteringConfig custom resource:

...
spec:
  tls:
    enabled: false
  ...

And I want to ask two question about building presto

* Just change `PRESTO_VERSION` from 322 to 328,then it can pull the 328 version code from `prestosql/presto`?

Nope, that environment variable only controls copying some versioned files from a previous container layer, and you would need to run pull down the upstream (prestosql/presto) release tag and attempt to merge that (e.g. git pull <whatever remote points to the prestosql/presto repo> 328).

  • How to clean the outputs if it is wrong with command docker build -f Dockerfile.okd .

It's difficult to say, and the main thing should be copying all the new directories from the upstream release tag (COPY <new dir> /build<new-dir>) which get highlighted when attempting to build that Dockerfile.

After that, it can be a bit tricky as the maven build can take quite a while and I've found that those error messages aren't as obvious as the docker-related build ones.

woodliu commented 4 years ago

sorry,i missed the operator images,i am tring now, thanks a lot !

woodliu commented 4 years ago

@timflannagan1 Hi, if i use 328 version operator, the operator return error :

Setting up watches.  Beware: since -r was given, this may take a while!
Watches established.
/tmp/ansible-operator/runner/metering.openshift.io/v1/MeteringConfig/metering/operator-metering/artifacts/6129484611666145821//stdout
Traceback (most recent call last):
  File "/usr/bin/ansible-playbook", line 63, in <module>
    from ansible.utils.display import Display
  File "/usr/lib/python2.7/site-packages/ansible/utils/display.py", line 60, in <module>
    class FilterUserInjector(logging.Filter):
  File "/usr/lib/python2.7/site-packages/ansible/utils/display.py", line 65, in FilterUserInjector
    username = getpass.getuser()
  File "/usr/lib64/python2.7/getpass.py", line 158, in getuser
    return pwd.getpwuid(os.getuid())[0]
KeyError: 'getpwuid(): uid not found: 1000300000'

So i think may be i should use another version of operator-metering that earlier than 4.5... or build a new image by myself. I will try these ways later.

woodliu commented 4 years ago

I have tried new version presto(330), use the presto connect to hive server. But got the same error... Using presto, i can see the catalog, schema and tables, but i can't read the data from tables or insert data to tables. Using hive, i can both read and write data. Even use command likehadoop fs -ls file:/mnt/metering/hive-metastore/metering.db/kwang_test i can get the right results.

Because we don't have the object storage like Amazon S3 or Azure. Just some OBS which implement S3 standard by other Manufacturer. Because presto returns the error when i use this kind of OBS, so i switch to file system volume to see what happend. I see there are someone have the same issue with presto, but have no answer to sloved the problem. It takes too much time to slove it, may be i should wait for the answer...