vmware-archive / sre-test

Greenplum - Open Source SRE test project.
2 stars 0 forks source link

Troubleshooting section should include solution for PXF server error : Compression codec not found #271

Closed indranil-cg closed 2 years ago

indranil-cg commented 2 years ago

Update to Documentation Troubleshooting PXF : https://gpdb.docs.pivotal.io/pxf/6-2/using/troubleshooting_pxf.html This section should describe the remedial measures to take if the error described below is encountered.

Error Message: PXF server error : java.lang.IllegalArgumentException:Compression codec com.hadoop.compression.lzo.LzoCodec not found. Cause: This issue is most likely to occur if the Hadoop cluster has been configured to use LZO compression, however the compression library is missing on the PXF server. Solution: Copy the compression library hadoop-lzo.jar from the Hadoop cluster to the $PXF_BASE/lib directory. Perform a pxf cluster sync followed by pxf cluster restart.

Testbed Description Greenplum Database 6.18.2 PXF 6.2 (pxf-gp6-6.2.0-2.el7.x86_64.rpm) CentOS Linux 7 x86_64 HVM EBS ENA 2002_01 (AWS EC2 Instance type t3a.large) JAVA 8 (java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64) Note: Greenplum Database cluster setup using https://github.com/greenplum-db/sre-test/tree/main/aws

AWS emr-6.4.0 (Hadoop - 3.2.1, Hive 3.1.2, HBase 2.4.4) 1x Master (m5.xlarge) 1x Core (m5.xlarge)

General Steps Install and configure PXF as per PXF document at https://gpdb.docs.pivotal.io/pxf/6-2/release/installing_pxf.html

  1. Install Java

  2. Install PXF

  3. Add JAVA_HOME to pxf-env.sh

  4. Setup Hadoop DB a. hdfs dfs -mkdir -p /data/pxftest1 b. hdfs dfs -put /home/Hadoop/sample_db.csv /data/pxftest1

  5. Configure PXF a. Create pxf server configuration directory /usr/local/pxf-gp6/servers/hdpserver b. From Hadoop copy core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml in above directory c. Copy pxf-site.xml from templates directory to above directory

  6. Set impersonation to false and pxf service user to gpadmin in pxf-site.xml

    <property>
    <name>pxf.service.user.impersonation</name>
    <value>false</value>
    </property>
    <property>
    <name>pxf.service.user.name</name>
    <value>gpadmin</value>
    </property>
  7. Sync and start pxf cluster

  8. In psql

    a.  CREATE EXTENSION pxf;
    b.  CREATE EXTERNAL TABLE hdp_csv_tbl (firstname text, lastname text, gender text, country text, age text, date text, eid text) LOCATION ('pxf://data/pxftest1/sample_db.csv?PROFILE=hdfs:csv&SERVER=hdpserver') FORMAT'CSV';
    c.  SELECT * FROM hdp_csv_tbl WHERE "firstname" LIKE 'Angel%' limit 10;

    Issue Observed

    ERROR:  PXF server error : java.lang.IllegalArgumentException:Compression codec com.hadoop.compression.lzo.LzoCodec not found.  (seg0 slice1 xx.xx.xx.xx:4000 pid=12345)
    HINT:  Check the PXF logs located in the '/usr/local/pxf-gp6/logs' directory on host 'sdw1_ipv4' or 'set client_min_messages=LOG' for additional details.
    CONTEXT:  External table pxf_hdp_csv_tbl, line 1 of file pxf://data/pxf_csv_db/sample_db.csv?PROFILE=hdfs:csv&COMPRESSION_CODEC=uncompressed&SERVER=hdpserver

    Remediation Steps

  9. From the Hadoop master copy /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar to the PXF lib directory /usr/local/pxf-gp6/lib

  10. Sync and restart pxf cluster

  11. From psql run the SQL query again, this time no error. Query returns results as expected

lisakowen commented 2 years ago

addressed in https://github.com/greenplum-db/pxf/pull/790.