Open rbheemana opened 8 years ago
I concur completely. Great job Ram! I have been using this Serde for the last month or so and it has been very useful.
Ram - I did notice something that I had to modify the serde for that you may want to take a look at. When the serde is outputting string fields and the is a character like newline or carriage return in the string it will split the row at that character so that row becomes two and is split in the improper position. To get around this I added a replace all function to change newline and carriage return to spaces.
It is a little bit of a hack the way that I did it because it does not address characters that split the lines (form feed, vertical tab, etc.). It just does newline and carriage returns.
CobolStringField.java case STRING: return s1; case VARCHAR: return new HiveVarchar(s1, this.length); I changed to this: case STRING: s1 = s1.replaceAll("\n", " "); s1 = s1.replaceAll("\r", " "); return s1; case VARCHAR: s1 = s1.replaceAll("\n", " "); s1 = s1.replaceAll("\r", " "); return new HiveVarchar(s1, this.length);
@datawarlock I have opened an issue https://github.com/rbheemana/Cobol-to-Hive/issues/13 with your suggestion. I am thinking in terms of specifying option to user on how to replace carriage return and new line like we do when we import through sqoop. Please feel free to comment on the issue if you have any other better approaches .
Hi Ram,
I believe I have found some offset issue. I am not sure if anyone has the problem. Assuming the following data field:-
01 RECORD
10 APPID PIC S9(12) USAGE COMP-3
The byte size for the field above should be 7 byte length. It seems the calculation from the code is 6. As far as I understand, the formula to calculate COMP-3 is ((n+1) / 2) and round up the value. In this case it should be (12+1) / 2 = 6.5, rounding up to 7
Does anyone has the problem at all ?
I have modify the code in CobolNumberField.java
if (this.compType == 3)
this.length = (int) Math.ceil((double) (this.length+1) / divideFactor);
else
this.length = (int) Math.ceil((double) this.length / divideFactor);
Hi Ram, I notice that many issues have been fixed recently. Is it your plan to post an updated version of CobolSerde.jar file, as currently referenced here: https://github.com/rbheemana/Cobol-to-Hive/tree/gh-pages/target ? Thanks in advance, WimGof
@WimGof you can generate CobolSerde.jar by yourself with smth like
Cobol-to-Hive-master\src>set %classpath%=%HADOOP_HOME%\*;%HIVE_HOME%\lib\*;
Cobol-to-Hive-master\src>javac com\savy3\hadoop\hive\serde2\cobol\*.java
Cobol-to-Hive-master\src>javac com\savy3\hadoop\hive\serde3\cobol\*.java
Cobol-to-Hive-master\src>javac com\savy3\mapred\*.java
Cobol-to-Hive-master\src>javac com\savy3\mapreduce\*.java
Cobol-to-Hive-master\src>jar cf CobolSerde_v030817.jar com
Cobol-to-Hive-master\src>jar tf CobolSerde_v030817.jar
for example, https://github.com/ankravch/Cobol-to-Hive/blob/gh-pages/target/CobolSerde_v030817.jar
@ankravch thanks for showing me the procedure. I downloaded the Cobol-to-Hive-Issue-18 branch, as this contains exactly the fix I am after. I was successful in compiling savy3/mapred, savy3/mapreduce, serde2/cobol. But in my last compilation step, serde3/cobol, I have the error below. Would this have to do with the versions of jars I am accessing, or would this be a code issue? Thanks again.
$ javac -Xlint:deprecation com/savy3/hadoop/hive/serde3/cobol/*.java
com/savy3/hadoop/hive/serde3/cobol/TestCobolFieldFactory.java:16: error: cannot find symbol
public void testGetCobolField() throws CobolSerdeException {
^
symbol: class CobolSerdeException
location: class TestCobolFieldFactory
com/savy3/hadoop/hive/serde3/cobol/CobolSerDe.java:31: warning: [deprecation] initialize(Configuration,Properties) in AbstractSerDe has
been deprecated
public void initialize(final Configuration conf, final Properties tbl)
^
1 error
1 warning
@WimGof I have added the import statement in the code. Please download the source again from branch Issue-18 and retry now.
@rbheemana Yes, this change solved the last error. I was able to compile and install the SerDe. I tested reading the cobol file that had issues before and this latest code in Cobol-to-Hive-Issue-18 seems to have solved the problem. My first tests only show good data. Thanks!
@rbheemana I keep getting duplicate column name in Hive. I don't have any duplicate columns in the copybook. Any thing I can do to help me debug this?
@RCGEnableBigDataDeveloper Please post your copybook and error.
Hi Ram ,
I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things
I am trying something like this π
CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'='450','data.format'='TEXT')
Please help me .
For this serde is built for ebcdic format files.
On Jul 12, 2017 at 3:10 PM, <sandeep-tandon (mailto:notifications@github.com)> wrote:
Hi Ram ,
I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things
how do we calculate fb.length property and
my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.
I am trying something like this π
CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'='450','data.format'='TEXT')
Please help me .
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314868061), or mute the thread (https://github.com/notifications/unsubscribe-auth/AMEsF6FgrK5_M2DKZx8QWXwe7YwL1HL_ks5sNRo2gaJpZM4GXSfb).
Thanks Ram. Can you please let me know what is this FB. Length property. I am working on fixed length files. How should I calculate it.
Regards, Sandeep
On 13 Jul 2017 05:01, "Ram Manohar Bheemana" notifications@github.com wrote:
For this serde is built for ebcdic format files.
On Jul 12, 2017 at 3:10 PM, <sandeep-tandon (mailto: notifications@github.com)> wrote:
Hi Ram ,
I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things
how do we calculate fb.length property and
my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.
I am trying something like this π
CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'= '450','data.format'='TEXT')
Please help me .
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314868061), or mute the thread (https://github.com/notifications/unsubscribe- auth/AMEsF6FgrK5_M2DKZx8QWXwe7YwL1HL_ks5sNRo2gaJpZM4GXSfb).
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314926613, or mute the thread https://github.com/notifications/unsubscribe-auth/Acu0nekcBfGO565zrRQnH4JQhRLiFyeSks5sNVc4gaJpZM4GXSfb .
If you are using Mainframe files, u can check the length by using "i" option in mainframe.
It is the length of each record
Sent from my iPhone
On Jul 12, 2017, at 11:27 PM, sandeep-tandon notifications@github.com wrote:
Thanks Ram. Can you please let me know what is this FB. Length property. I am working on fixed length files. How should I calculate it.
Regards, Sandeep
On 13 Jul 2017 05:01, "Ram Manohar Bheemana" notifications@github.com wrote:
For this serde is built for ebcdic format files.
On Jul 12, 2017 at 3:10 PM, <sandeep-tandon (mailto: notifications@github.com)> wrote:
Hi Ram ,
I saw ur code and Iam working with a fixed length data file , i have created copy book also with only one 01 level in it .MY data is in a text file .I need help with 2 things
how do we calculate fb.length property and
my data is in text file , do i need to convert it into binary format like EBCDIC , if not can i define data.format property as TEXT in the TBL Properties.
I am trying something like this π
CREATE EXTERNAL TABLE CobolHive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT "org.apache.hadoop.mapred.FixedLengthInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" LOCATION '/user/sandeep/input' TBLPROPERTIES ('cobol.layout.url'='/user/sandeep/copybook','fb.length'= '450','data.format'='TEXT')
Please help me .
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314868061), or mute the thread (https://github.com/notifications/unsubscribe- auth/AMEsF6FgrK5_M2DKZx8QWXwe7YwL1HL_ks5sNRo2gaJpZM4GXSfb).
β You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbheemana/Cobol-to-Hive/issues/3#issuecomment-314926613, or mute the thread https://github.com/notifications/unsubscribe-auth/Acu0nekcBfGO565zrRQnH4JQhRLiFyeSks5sNVc4gaJpZM4GXSfb .
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @rbheemana,
My mainframe(ebcdic) file has packed decimals also, could you please guide me how that can be handled? Please let me know if teh serde you provided takes care of the packed decimal as well.
Sanjeev sanjeevkrishna51@gmail.com
Hi @rbheemana May i know whether all comp issues (even number & odd number) resolved in latest code ? and also what are the things it cannot handle?
Hi @rbheemana ,
we had a redefines in copybook like below , I am getting duplicate column exception , How can i enhance code to handle this.
30 A-TOTAL-FEE PIC S9(15)V99 COMP-3.
30 A-TOTAL-INT REDEFINES
A-TOTAL-FEE PIC S9(12)V9(05) COMP-3.
Hello All, I am to work on EBCDIC to Ascii conversion using copybooks in Hadoop. This post caught my attention. I have downloaded the code and started to look in to it. I see there are two folder serde2 and serde3 , and sede 3 have only imported serdeexception and utils from serde2,
Thanks, ..Raja
Yes for both
@manichinnari555 Thank you ! Request to drop a test mail to rajasekhar.gnv@gmail.com. I would like to talk to you and understand how to use this code.
Thanks, ..Raja
Hi everyone , I have a similar requirement dealing with Mainframes to Hadoop data conversion. Below are the details. Any suggestions / Information is greatly appreciated. Thankyou in advance !
Hi rbheemana,
I have created the layout file:-
01 WS-DESCRIPTION. 10 DPN-TABLE-NAME X(8). 10 DPN-EMP-NBR S9(9) COMP. 10 DPN-CARR-CODE XXX. 10 DPN-ACVY-STRT-DATE X(10). 10 DPN-CREW-ACVY-CODE XXX. 10 DPN-PRNG-NBR X(5). 10 DPN-PRNG-ORIG-DATE X(10). 10 DPN-ASNT-DAYS-CNT S9(4). 10 DPN-RVEW-DATE X(10). 10 DPN-CONJ-ACVY-CODE XXX. 10 DPN-NOTE-RCVD-IND X. 10 DPN-DROP-OFF-DATE X(10). 10 DPN-DPND-CMNT-TEXT X(65)
Create table statement:- CREATE TABLE test_mainframe ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'com.savy3.mapred.MainframeVBInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/cloudera/test_mainframe' TBLPROPERTIES ('cobol.layout.url'='/user/cloudera/layout/vbcopy.txt');
it is failing with error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table hive>
Checked with both jar:
Could you please help me in this?
Regards, Harshit
Hi rbheemana, Does this method work with mainframe data that is already in .txt file format (both the data and the copybook is .txt) ? Can anyone else who knows the answer please reply aswell. Thanks
Yes, it will.
Sent from my iPhone
On Apr 24, 2019, at 5:28 AM, Harshit Goyal notifications@github.com wrote:
Hi rbheemana,
I have created the layout file:-
01 WS-DESCRIPTION. 10 DPN-TABLE-NAME X(8). 10 DPN-EMP-NBR S9(9) COMP. 10 DPN-CARR-CODE XXX. 10 DPN-ACVY-STRT-DATE X(10). 10 DPN-CREW-ACVY-CODE XXX. 10 DPN-PRNG-NBR X(5). 10 DPN-PRNG-ORIG-DATE X(10). 10 DPN-ASNT-DAYS-CNT S9(4). 10 DPN-RVEW-DATE X(10). 10 DPN-CONJ-ACVY-CODE XXX. 10 DPN-NOTE-RCVD-IND X. 10 DPN-DROP-OFF-DATE X(10). 10 DPN-DPND-CMNT-TEXT X(65)
Create table statement:- CREATE TABLE test_mainframe ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'com.savy3.mapred.MainframeVBInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/cloudera/test_mainframe' TBLPROPERTIES ('cobol.layout.url'='/user/cloudera/layout/vbcopy.txt');
it is failing with error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table hive>
Checked with both jar:
Jar is located https://github.com/rbheemana/Cobol-to-Hive/tree/gh-pages/target latest code jar build using maven. : Cobol-to-Hive-1.1.0.jar Could you please help me in this?
Regards, Harshit
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi I tried to create a table as below, but I get an error , can anyone please help. Some notices:
ADD JAR /user/cloudera/copybooks/CobolSerde.jar; CREATE EXTERNAL TABLE cobol2hive ROW FORMAT SERDE 'com.savy3.cobolserde.CobolSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLenghtInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/user/hive/warehouse/datafile.txt' TBLPROPERTIES ('cobol.layout.url'='/user/cloudera/copybooks/COPYBOOK.TXT','fb.length'='130');
Error: AnalysisException: Syntax error in line 3:undefined: ROW FORMAT SERDE 'com.savy3.cobolserde.CobolSerde' ^ Encountered: IDENTIFIER Expected: DELIMITED CAUSED BY: Exception: Syntax error
Running these command: ADD JAR /user/cloudera/copybooks/CobolSerde.jar;
also delivers the following error:
AnalysisException: Syntax error in line 1:undefined: ADD JAR /user/cloudera/copybooks/CobolSerde.jar ^ Encountered: ADD Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH CAUSED BY: Exception: Syntax error
That is weird, I will try to replicate the scenario at my end. For you to get going.Β Before creation of new table try below command And then create table let me know if it is issue even after doing that.
Hello Ram,
We are having exactly the same issue again, can you please help fix the issue?
when i create a new table and does a describe on it.. its showing the table layout of the previous table.. any reasons and when i do the set cobol.hive.mapping, this is also showing old layout(created using previous cobol copy book). Can you please point out as why its doing it and what should we do for this not to show this..?
Could you please share the create table statements and sequence of steps and their command-line outputt
Running these command: ADD JAR /user/cloudera/copybooks/CobolSerde.jar;
also delivers the following error:
AnalysisException: Syntax error in line 1:undefined: ADD JAR /user/cloudera/copybooks/CobolSerde.jar ^ Encountered: ADD Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH CAUSED BY: Exception: Syntax error
Please check https://community.cloudera.com/t5/Support-Questions/Adding-hive-auxiliary-jar-files/td-p/120245
Also try to place the jar file on hdfs location instead of local location
@rbheemana Hello, I am trying to upload a file in ebcdic format using the library that made it available but I am having problems with the columns defined as PIC S9 (n) COMP-3. For fields that are not COMP I can see the data normally, for the field with COMP-3 all values ββare null. Below are the copybook used and the parameters for creating the table in the hive.
File fugtbc25.cbl 01 WS-FUGTBC25. 03 WS-NU-FUGC25 PIC S9(018) COMP-3. 03 WS-CO-TIPO-CONTA-FUGC07 PIC 9(004).
parameters create table
ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 'hdfs:///user/fugtbc25' TBLPROPERTIES ('cobol.layout.url' = 'hdfs:///user/fugtbc25.cbl','fb.length'='14');
Would you help me?
Hi Ram,
Thank you very much. Your new changes does fixed the cache problem. You are right and nailed down to the root cause by removing all
static
declaration. I have tested and it works now without restarting the session. You rock!!!