Error Message: PXF server error : java.lang.IllegalArgumentException:Compression codec com.hadoop.compression.lzo.LzoCodec not found.
Cause: This issue is most likely to occur if the Hadoop cluster has been configured to use LZO compression, however the compression library is missing on the PXF server.
Solution: Copy the compression library hadoop-lzo.jar from the Hadoop cluster to the $PXF_BASE/lib directory. Perform a pxf cluster sync followed by pxf cluster restart.
Testbed Description
Greenplum Database 6.18.2
PXF 6.2 (pxf-gp6-6.2.0-2.el7.x86_64.rpm)
CentOS Linux 7 x86_64 HVM EBS ENA 2002_01 (AWS EC2 Instance type t3a.large)
JAVA 8 (java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64)
Note: Greenplum Database cluster setup using https://github.com/greenplum-db/sre-test/tree/main/aws
Setup Hadoop DB
a. hdfs dfs -mkdir -p /data/pxftest1
b. hdfs dfs -put /home/Hadoop/sample_db.csv /data/pxftest1
Configure PXF
a. Create pxf server configuration directory /usr/local/pxf-gp6/servers/hdpserver
b. From Hadoop copy core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml in above directory
c. Copy pxf-site.xml from templates directory to above directory
Set impersonation to false and pxf service user to gpadmin in pxf-site.xml
a. CREATE EXTENSION pxf;
b. CREATE EXTERNAL TABLE hdp_csv_tbl (firstname text, lastname text, gender text, country text, age text, date text, eid text) LOCATION ('pxf://data/pxftest1/sample_db.csv?PROFILE=hdfs:csv&SERVER=hdpserver') FORMAT'CSV';
c. SELECT * FROM hdp_csv_tbl WHERE "firstname" LIKE 'Angel%' limit 10;
Issue Observed
ERROR: PXF server error : java.lang.IllegalArgumentException:Compression codec com.hadoop.compression.lzo.LzoCodec not found. (seg0 slice1 xx.xx.xx.xx:4000 pid=12345)
HINT: Check the PXF logs located in the '/usr/local/pxf-gp6/logs' directory on host 'sdw1_ipv4' or 'set client_min_messages=LOG' for additional details.
CONTEXT: External table pxf_hdp_csv_tbl, line 1 of file pxf://data/pxf_csv_db/sample_db.csv?PROFILE=hdfs:csv&COMPRESSION_CODEC=uncompressed&SERVER=hdpserver
Remediation Steps
From the Hadoop master copy /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar to the PXF lib directory /usr/local/pxf-gp6/lib
Sync and restart pxf cluster
From psql run the SQL query again, this time no error. Query returns results as expected
Update to Documentation Troubleshooting PXF : https://gpdb.docs.pivotal.io/pxf/6-2/using/troubleshooting_pxf.html This section should describe the remedial measures to take if the error described below is encountered.
Error Message: PXF server error : java.lang.IllegalArgumentException:Compression codec com.hadoop.compression.lzo.LzoCodec not found. Cause: This issue is most likely to occur if the Hadoop cluster has been configured to use LZO compression, however the compression library is missing on the PXF server. Solution: Copy the compression library hadoop-lzo.jar from the Hadoop cluster to the $PXF_BASE/lib directory. Perform a
pxf cluster sync
followed bypxf cluster restart
.Testbed Description Greenplum Database 6.18.2 PXF 6.2 (pxf-gp6-6.2.0-2.el7.x86_64.rpm) CentOS Linux 7 x86_64 HVM EBS ENA 2002_01 (AWS EC2 Instance type t3a.large) JAVA 8 (java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64) Note: Greenplum Database cluster setup using https://github.com/greenplum-db/sre-test/tree/main/aws
AWS emr-6.4.0 (Hadoop - 3.2.1, Hive 3.1.2, HBase 2.4.4) 1x Master (m5.xlarge) 1x Core (m5.xlarge)
General Steps Install and configure PXF as per PXF document at https://gpdb.docs.pivotal.io/pxf/6-2/release/installing_pxf.html
Install Java
Install PXF
Add JAVA_HOME to pxf-env.sh
Setup Hadoop DB a. hdfs dfs -mkdir -p /data/pxftest1 b. hdfs dfs -put /home/Hadoop/sample_db.csv /data/pxftest1
Configure PXF a. Create pxf server configuration directory /usr/local/pxf-gp6/servers/hdpserver b. From Hadoop copy core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml in above directory c. Copy pxf-site.xml from templates directory to above directory
Set impersonation to false and pxf service user to gpadmin in pxf-site.xml
Sync and start pxf cluster
In psql
Issue Observed
Remediation Steps
From the Hadoop master copy /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar to the PXF lib directory /usr/local/pxf-gp6/lib
Sync and restart pxf cluster
From psql run the SQL query again, this time no error. Query returns results as expected