rbheemana / Cobol-to-Hive

Serde for Cobol Layout to Hive table
Apache License 2.0
24 stars 23 forks source link

(post comments) Cobol to Hive Serde blog #3

Open rbheemana opened 8 years ago

josephlim75 commented 8 years ago

Hi Ram,

Thank you very much. Your new changes does fixed the cache problem. You are right and nailed down to the root cause by removing all static declaration. I have tested and it works now without restarting the session. You rock!!!

datawarlock commented 8 years ago

I concur completely. Great job Ram! I have been using this Serde for the last month or so and it has been very useful.

Ram - I did notice something that I had to modify the serde for that you may want to take a look at. When the serde is outputting string fields and the is a character like newline or carriage return in the string it will split the row at that character so that row becomes two and is split in the improper position. To get around this I added a replace all function to change newline and carriage return to spaces.

It is a little bit of a hack the way that I did it because it does not address characters that split the lines (form feed, vertical tab, etc.). It just does newline and carriage returns.

CobolStringField.java case STRING: return s1; case VARCHAR: return new HiveVarchar(s1, this.length); I changed to this: case STRING: s1 = s1.replaceAll("\n", " "); s1 = s1.replaceAll("\r", " "); return s1; case VARCHAR: s1 = s1.replaceAll("\n", " "); s1 = s1.replaceAll("\r", " "); return new HiveVarchar(s1, this.length);

rbheemana commented 8 years ago

@datawarlock I have opened an issue https://github.com/rbheemana/Cobol-to-Hive/issues/13 with your suggestion. I am thinking in terms of specifying option to user on how to replace carriage return and new line like we do when we import through sqoop. Please feel free to comment on the issue if you have any other better approaches .

josephlim75 commented 8 years ago

Hi Ram,

I believe I have found some offset issue. I am not sure if anyone has the problem. Assuming the following data field:-

01 RECORD 10 APPID PIC S9(12) USAGE COMP-3

The byte size for the field above should be 7 byte length. It seems the calculation from the code is 6. As far as I understand, the formula to calculate COMP-3 is ((n+1) / 2) and round up the value. In this case it should be (12+1) / 2 = 6.5, rounding up to 7

Does anyone has the problem at all ?

I have modify the code in CobolNumberField.java if (this.compType == 3) this.length = (int) Math.ceil((double) (this.length+1) / divideFactor); else this.length = (int) Math.ceil((double) this.length / divideFactor);

WimGof commented 7 years ago

Hi Ram, I notice that many issues have been fixed recently. Is it your plan to post an updated version of CobolSerde.jar file, as currently referenced here: https://github.com/rbheemana/Cobol-to-Hive/tree/gh-pages/target ? Thanks in advance, WimGof

ankravch commented 7 years ago

@WimGof you can generate CobolSerde.jar by yourself with smth like

Cobol-to-Hive-master\src>set %classpath%=%HADOOP_HOME%\*;%HIVE_HOME%\lib\*;
Cobol-to-Hive-master\src>javac com\savy3\hadoop\hive\serde2\cobol\*.java
Cobol-to-Hive-master\src>javac com\savy3\hadoop\hive\serde3\cobol\*.java
Cobol-to-Hive-master\src>javac com\savy3\mapred\*.java
Cobol-to-Hive-master\src>javac com\savy3\mapreduce\*.java
Cobol-to-Hive-master\src>jar cf CobolSerde_v030817.jar com
Cobol-to-Hive-master\src>jar tf CobolSerde_v030817.jar

for example, https://github.com/ankravch/Cobol-to-Hive/blob/gh-pages/target/CobolSerde_v030817.jar

WimGof commented 7 years ago

@ankravch thanks for showing me the procedure. I downloaded the Cobol-to-Hive-Issue-18 branch, as this contains exactly the fix I am after. I was successful in compiling savy3/mapred, savy3/mapreduce, serde2/cobol. But in my last compilation step, serde3/cobol, I have the error below. Would this have to do with the versions of jars I am accessing, or would this be a code issue? Thanks again.

$ javac -Xlint:deprecation com/savy3/hadoop/hive/serde3/cobol/*.java
com/savy3/hadoop/hive/serde3/cobol/TestCobolFieldFactory.java:16: error: cannot find symbol
        public void testGetCobolField() throws CobolSerdeException {
                                               ^
  symbol:   class CobolSerdeException
  location: class TestCobolFieldFactory
com/savy3/hadoop/hive/serde3/cobol/CobolSerDe.java:31: warning: [deprecation] initialize(Configuration,Properties) in AbstractSerDe has 

been deprecated
        public void initialize(final Configuration conf, final Properties tbl)
                    ^
1 error
1 warning
rbheemana commented 7 years ago

@WimGof I have added the import statement in the code. Please download the source again from branch Issue-18 and retry now.

WimGof commented 7 years ago

@rbheemana Yes, this change solved the last error. I was able to compile and install the SerDe. I tested reading the cobol file that had issues before and this latest code in Cobol-to-Hive-Issue-18 seems to have solved the problem. My first tests only show good data. Thanks!

RCGEnableBigDataDeveloper commented 7 years ago

@rbheemana I keep getting duplicate column name in Hive. I don't have any duplicate columns in the copybook. Any thing I can do to help me debug this?

rbheemana commented 7 years ago

@RCGEnableBigDataDeveloper Please post your copybook and error.

sandeep-tandon commented 7 years ago

Hi Ram ,

I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things

  1. how do we calculate fb.length property and
  2. my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.

I am trying something like this πŸ‘

CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'='450','data.format'='TEXT')

Please help me .

rbheemana commented 7 years ago

For this serde is built for ebcdic format files.

On Jul 12, 2017 at 3:10 PM, <sandeep-tandon (mailto:notifications@github.com)> wrote:

Hi Ram ,

I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things

how do we calculate fb.length property and

my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.

I am trying something like this πŸ‘

CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'='450','data.format'='TEXT')

Please help me .

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314868061), or mute the thread (https://github.com/notifications/unsubscribe-auth/AMEsF6FgrK5_M2DKZx8QWXwe7YwL1HL_ks5sNRo2gaJpZM4GXSfb).

sandeep-tandon commented 7 years ago

Thanks Ram. Can you please let me know what is this FB. Length property. I am working on fixed length files. How should I calculate it.

Regards, Sandeep

On 13 Jul 2017 05:01, "Ram Manohar Bheemana" notifications@github.com wrote:

For this serde is built for ebcdic format files.

On Jul 12, 2017 at 3:10 PM, <sandeep-tandon (mailto: notifications@github.com)> wrote:

Hi Ram ,

I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things

how do we calculate fb.length property and

my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.

I am trying something like this πŸ‘

CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'= '450','data.format'='TEXT')

Please help me .

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314868061), or mute the thread (https://github.com/notifications/unsubscribe- auth/AMEsF6FgrK5_M2DKZx8QWXwe7YwL1HL_ks5sNRo2gaJpZM4GXSfb).

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314926613, or mute the thread https://github.com/notifications/unsubscribe-auth/Acu0nekcBfGO565zrRQnH4JQhRLiFyeSks5sNVc4gaJpZM4GXSfb .

rbheemana commented 7 years ago

If you are using Mainframe files, u can check the length by using "i" option in mainframe.

It is the length of each record

Sent from my iPhone

On Jul 12, 2017, at 11:27 PM, sandeep-tandon notifications@github.com wrote:

Thanks Ram. Can you please let me know what is this FB. Length property. I am working on fixed length files. How should I calculate it.

Regards, Sandeep

On 13 Jul 2017 05:01, "Ram Manohar Bheemana" notifications@github.com wrote:

For this serde is built for ebcdic format files.

On Jul 12, 2017 at 3:10 PM, <sandeep-tandon (mailto: notifications@github.com)> wrote:

Hi Ram ,

I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things

how do we calculate fb.length property and

my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.

I am trying something like this πŸ‘

CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'= '450','data.format'='TEXT')

Please help me .

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314868061), or mute the thread (https://github.com/notifications/unsubscribe- auth/AMEsF6FgrK5_M2DKZx8QWXwe7YwL1HL_ks5sNRo2gaJpZM4GXSfb).

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314926613, or mute the thread https://github.com/notifications/unsubscribe-auth/Acu0nekcBfGO565zrRQnH4JQhRLiFyeSks5sNVc4gaJpZM4GXSfb .

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Krishna7-2 commented 7 years ago

Hi @rbheemana,

My mainframe(ebcdic) file has packed decimals also, could you please guide me how that can be handled? Please let me know if teh serde you provided takes care of the packed decimal as well.

Sanjeev sanjeevkrishna51@gmail.com

Nandakishore43 commented 6 years ago

Hi @rbheemana May i know whether all comp issues (even number & odd number) resolved in latest code ? and also what are the things it cannot handle?

Nandakishore43 commented 6 years ago

Hi @rbheemana ,

we had a redefines in copybook like below , I am getting duplicate column exception , How can i enhance code to handle this.

30 A-TOTAL-FEE PIC S9(15)V99 COMP-3.
30 A-TOTAL-INT REDEFINES
A-TOTAL-FEE PIC S9(12)V9(05) COMP-3.

rajasekhargade commented 6 years ago

Hello All, I am to work on EBCDIC to Ascii conversion using copybooks in Hadoop. This post caught my attention. I have downloaded the code and started to look in to it. I see there are two folder serde2 and serde3 , and sede 3 have only imported serdeexception and utils from serde2,

  1. so can i assume all the parsing and conversion is being done by the code in Serde3 ?
  2. Does this code run in utilizing the Hadoop Capabilities ?

Thanks, ..Raja

manichinnari555 commented 6 years ago

Yes for both

rajasekhargade commented 6 years ago

@manichinnari555 Thank you ! Request to drop a test mail to rajasekhar.gnv@gmail.com. I would like to talk to you and understand how to use this code.

Thanks, ..Raja

VijethaSenigaram commented 6 years ago

Hi everyone , I have a similar requirement dealing with Mainframes to Hadoop data conversion. Below are the details. Any suggestions / Information is greatly appreciated. Thankyou in advance !

image

harshitgoyal commented 5 years ago

Hi rbheemana,

I have created the layout file:-

01 WS-DESCRIPTION. 10 DPN-TABLE-NAME X(8). 10 DPN-EMP-NBR S9(9) COMP. 10 DPN-CARR-CODE XXX. 10 DPN-ACVY-STRT-DATE X(10). 10 DPN-CREW-ACVY-CODE XXX. 10 DPN-PRNG-NBR X(5). 10 DPN-PRNG-ORIG-DATE X(10). 10 DPN-ASNT-DAYS-CNT S9(4). 10 DPN-RVEW-DATE X(10). 10 DPN-CONJ-ACVY-CODE XXX. 10 DPN-NOTE-RCVD-IND X. 10 DPN-DROP-OFF-DATE X(10). 10 DPN-DPND-CMNT-TEXT X(65)

Create table statement:- CREATE TABLE test_mainframe ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'com.savy3.mapred.MainframeVBInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/cloudera/test_mainframe' TBLPROPERTIES ('cobol.layout.url'='/user/cloudera/layout/vbcopy.txt');

it is failing with error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table hive>

Checked with both jar:

  1. Jar is located https://github.com/rbheemana/Cobol-to-Hive/tree/gh-pages/target
  2. latest code jar build using maven. : Cobol-to-Hive-1.1.0.jar

Could you please help me in this?

Regards, Harshit

thenervousgeek commented 5 years ago

Hi rbheemana, Does this method work with mainframe data that is already in .txt file format (both the data and the copybook is .txt) ? Can anyone else who knows the answer please reply aswell. Thanks

manichinnari555 commented 5 years ago

Yes, it will.

Sent from my iPhone

On Apr 24, 2019, at 5:28 AM, Harshit Goyal notifications@github.com wrote:

Hi rbheemana,

I have created the layout file:-

01 WS-DESCRIPTION. 10 DPN-TABLE-NAME X(8). 10 DPN-EMP-NBR S9(9) COMP. 10 DPN-CARR-CODE XXX. 10 DPN-ACVY-STRT-DATE X(10). 10 DPN-CREW-ACVY-CODE XXX. 10 DPN-PRNG-NBR X(5). 10 DPN-PRNG-ORIG-DATE X(10). 10 DPN-ASNT-DAYS-CNT S9(4). 10 DPN-RVEW-DATE X(10). 10 DPN-CONJ-ACVY-CODE XXX. 10 DPN-NOTE-RCVD-IND X. 10 DPN-DROP-OFF-DATE X(10). 10 DPN-DPND-CMNT-TEXT X(65)

Create table statement:- CREATE TABLE test_mainframe ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'com.savy3.mapred.MainframeVBInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/cloudera/test_mainframe' TBLPROPERTIES ('cobol.layout.url'='/user/cloudera/layout/vbcopy.txt');

it is failing with error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table hive>

Checked with both jar:

Jar is located https://github.com/rbheemana/Cobol-to-Hive/tree/gh-pages/target latest code jar build using maven. : Cobol-to-Hive-1.1.0.jar Could you please help me in this?

Regards, Harshit

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

thenervousgeek commented 5 years ago

Hi I tried to create a table as below, but I get an error , can anyone please help. Some notices:

  1. Jar file (CobolSerde.jar) is in hdfs
  2. datafile.txt is a text file which is in hdfs aswell
  3. COPYBOOK.TXT is the copybook which is a text file residing in hdfs as well.

ADD JAR /user/cloudera/copybooks/CobolSerde.jar; CREATE EXTERNAL TABLE cobol2hive ROW FORMAT SERDE 'com.savy3.cobolserde.CobolSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLenghtInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/hive/warehouse/datafile.txt' TBLPROPERTIES ('cobol.layout.url'='/user/cloudera/copybooks/COPYBOOK.TXT','fb.length'='130');

Error: AnalysisException: Syntax error in line 3:undefined: ROW FORMAT SERDE 'com.savy3.cobolserde.CobolSerde' ^ Encountered: IDENTIFIER Expected: DELIMITED CAUSED BY: Exception: Syntax error

thenervousgeek commented 5 years ago

Running these command: ADD JAR /user/cloudera/copybooks/CobolSerde.jar;

also delivers the following error:

AnalysisException: Syntax error in line 1:undefined: ADD JAR /user/cloudera/copybooks/CobolSerde.jar ^ Encountered: ADD Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH CAUSED BY: Exception: Syntax error

manichinnari555 commented 4 years ago

That is weird, I will try to replicate the scenario at my end. For you to get going.Β Before creation of new table try below command And then create table let me know if it is issue even after doing that.

Hello Ram,

We are having exactly the same issue again, can you please help fix the issue?

when i create a new table and does a describe on it.. its showing the table layout of the previous table.. any reasons and when i do the set cobol.hive.mapping, this is also showing old layout(created using previous cobol copy book). Can you please point out as why its doing it and what should we do for this not to show this..?

rbheemana commented 4 years ago

Could you please share the create table statements and sequence of steps and their command-line outputt

rbheemana commented 4 years ago

Running these command: ADD JAR /user/cloudera/copybooks/CobolSerde.jar;

also delivers the following error:

AnalysisException: Syntax error in line 1:undefined: ADD JAR /user/cloudera/copybooks/CobolSerde.jar ^ Encountered: ADD Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH CAUSED BY: Exception: Syntax error

Please check https://community.cloudera.com/t5/Support-Questions/Adding-hive-auxiliary-jar-files/td-p/120245

rbheemana commented 4 years ago

Also try to place the jar file on hdfs location instead of local location

bsoliveiram commented 4 years ago

@rbheemana Hello, I am trying to upload a file in ebcdic format using the library that made it available but I am having problems with the columns defined as PIC S9 (n) COMP-3. For fields that are not COMP I can see the data normally, for the field with COMP-3 all values ​​are null. Below are the copybook used and the parameters for creating the table in the hive.

File fugtbc25.cbl 01 WS-FUGTBC25. 03 WS-NU-FUGC25 PIC S9(018) COMP-3. 03 WS-CO-TIPO-CONTA-FUGC07 PIC 9(004).

parameters create table

ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 'hdfs:///user/fugtbc25' TBLPROPERTIES ('cobol.layout.url' = 'hdfs:///user/fugtbc25.cbl','fb.length'='14');

Would you help me?