Open rbheemana opened 9 years ago
Hey please try it works, we are using in our system. Here is the exact line of code which does the decoding part in CobolDeserializer. byte[] temp = transcodeField(fieldBytes, Charset.forName("ebcdic-cp-us"),Charset.forName("ascii"),columnName);
thanks. Did not look into details. I am able to create the hive table from copybook. However select * from table returns empty. Do you mind add pom.xml to the project and remove/comment out failed test cases? Or share a workable jar so I can test if it is sth wrong with data or code.
Hello Willddy,
I am testing it but I am getting the following error(FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.savy3.hadoop.hive.serde2.cobol.CobolSerde ), can you please help me how did you managed to use the serde. I have performed the following steps but still the serde issue hasnt solved.
my email willddy@gmail.com. Feel free to contact me. Need to see your copybook. As far as I know, there are couple of limitation for this right now
rbheemana, does it support multiple schema in the copy book. For example, data and trailer has different schema, but in the same data file?
if there are multiple 01 levels it does not. However if you use redefines clause for header and trailer in same 01 level group it works absolutely fine.
Hi rbheemana, we have tried this serde to run the mainframeVBfile. But it is throwing the below error.
2016-05-18 15:03:40.797, Split = 32%, Split No: 3 start Pos: -49805 splitsize: 49865 Records in split: 1 Failed with exception java.io.IOException:java.io.IOException: Partial record(length = -49805) found at the end of file hdfs://localhost:8020/user/hive/warehouse/ptab28/combdatdpd.txt
copybook: 01 WS-VAR. 05 WS-NAME PIC X(12). 05 WS-MARKS-LENGTH PIC 9(2). 05 WS-marks OCCURS 0 to 6 TIMES DEPENDING ON WS-MARKS-LENGTH. 10 WS-MARK PIC 999. 05 WS-NICKNAME PIC X(6).
inputdata(EBCDIC binary): ÂÉÓÓ@âÔÉã@@@@öø÷ùø÷ùø÷ùø÷ùø÷ùø÷ùÂÉÓÓ
Corresponding ASCII text: BILL SMIT 6879879879879879879BILL
commands used:
CREATE TABLE ptab28 ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'com.savy3.mapred.MainframeVBInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' TBLPROPERTIES ('cobol.layout.url'='/user/input/vbcopy.txt');
load data local inpath 'vbtext.txt' into table ptab38;
Could you please help as quick as possible. Please reach me at sekhar.pgs@gmail.com
Hi, While transferring VB file from mainframe please make sure you enable RDW byte. Based on the error I believe the file is missing RDW byte. Thanks,Ram
Hi Ram,
I am pulling a VB file from mainframe to linux and i have enabled the RDW byte using the command "quote site RDW" but when trying to run the query i am getting the following error..
Hive Runtime Error while processing writable 83 3f 00 00 00 00 00 0f 00 00 00 42 71 6c 40 40 40 00 0c 04 27 16 11 8c 00 1f 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:505) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 8 more Caused by: java.lang.IllegalArgumentException: 81 > 64 at java.util.Arrays.copyOfRange(Arrays.java:2549) at com.savy3.hadoop.hive.serde3.cobol.CobolField.getBytes(CobolField.java:101) at com.savy3.hadoop.hive.serde3.cobol.CobolStringField.deserialize(CobolStringField.java:57) at com.savy3.hadoop.hive.serde3.cobol.CobolGroupField.deserialize(CobolGroupField.java:236) at com.savy3.hadoop.hive.serde3.cobol.CobolGroupField.deserialize(CobolGroupField.java:241) at com.savy3.hadoop.hive.serde3.cobol.CobolGroupField.deserialize(CobolGroupField.java:241) at com.savy3.hadoop.hive.serde3.cobol.CobolGroupField.deserialize(CobolGroupField.java:241) at com.savy3.hadoop.hive.serde3.cobol.CobolToHive.deserialize(CobolToHive.java:28) at com.savy3.hadoop.hive.serde3.cobol.CobolSerDe.deserialize(CobolSerDe.java:62) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:136) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:100) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:496)
Thanks
Please verify cobol.hive.mapping property immediately after the creation of table. This will list out cobol fields and their offset, length. This should resolve cobol copybook layout problems, once that is resolved. Please let me know if your still facing the issue. Thanks,Ram
Hi Ram,
Thanks for your response. I am trying to find the log for cobol.hive.mapping and couldnt find it. can you please suggest as where i can find the cobol.hive.mapping..
Thanks!
It is a configuration variable that is set while creating table.So after creation of table from the hive prompt, use 'set cobol.hive.mapping;' command on hive prompt to list the value. Thanks,Ram
Hi Ram,
Thanks! I did the steps and still getting this error..
Failed with exception java.io.IOException:java.io.IOException: Partial record(length = -14835) found at the end of file hdfs://quickstart.cloudera:8020/user/cloudera/source
Thanks
Hello,
May I know how your uploading/transferring the binary file to hdfs?
Regards, Venkat
Sent from my iPhone
On May 31, 2016, at 2:06 PM, njagadam notifications@github.com wrote:
Hi Ram,
Thanks! I did the steps and still getting this error..
Failed with exception java.io.IOException:java.io.IOException: Partial record(length = -14835) found at the end of file hdfs://quickstart.cloudera:8020/user/cloudera/source
Thanks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Hi, manichinnari555 . Are you solved your last issue (FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.savy3.hadoop.hive.serde2.cobol.CobolSerde)? If so, can you share your approach, please? I´m been in the same problem! PS.: I´m not using Cloudera stack, but Hortonworks (HDP 2.4) Cheers, Rafael
Hi Venkat,
I tried multiple options..
1) ftp user pwd binary quote site rdw
2) ftp user pwd binary type e quote site rdw
3) ftp user pwd quote mode b type e quote site rdw
etc..
Thanks
Hello rafaelpiresm,
This is related to the access issue, you would need to login/work with application id rather then your personal id, if your working in SAN box, then it shouldn't be a problem. Thanks.
Hello njagadam ,
Then it think you would need to check the copybook layout the file length, please make sure both of them are matching.
Regards, Venkat
Hi Ram,
when i create a new table and does a describe on it.. its showing the table layout of the previous table.. any reasons and when i do the set cobol.hive.mapping, this is also showing old layout(created using previous cobol copy book). Can you please point out as why its doing it and what should we do for this not to show this..?
Thanks
That is weird, I will try to replicate the scenario at my end. For you to get going. Before creation of new table try below command
I tired the command set cobol.hive.mapping="" before creating the table but its still showing the previous table ddl and strange this it is showing the previous table ddl even when i droped the previous table.
I dropped all the tables and did a set cobol.hive.mapping and its still showing the first table layout creation.
Is this saved in the buffer which it is using and not getting overwritten or am i missing something..?
Thanks
Will try to do replicate this issue in the evening or tomorrow. Can't tell anything without seeing the issue.
Hi,
Can I know how your creating the table is it with HUE or Beeline? I can't what is issue exactly, even I faced it, but try creating in beeline Cli or HIve cli it should solve the issue.
Regards, Venkat
Hi Venkat,
I created the table using Hive cli..
Thanks
Hello,
Yes, I am also getting similar issue and sometimes it is working, I am not sure what is the issue.
Regards, Venkat
Yes,
It looks like it is storing the ddl in the buffer and using it, if you come out of cli and login back it seems working, please let me know if that works. Thanks
Hi, guys! Thank you, manichinnari555. The jar that i have´d downloaded was corrupted. Now it works, but I´m getting an error message, when I try to create a table that looks like the sample provided in the documentation:
"Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table (state=08S01,code=1)"
My command is: "CREATE TABLE Cobol2Hive ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'com.savy3.mapred.MainframeVBInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 'hdfs:///path/to/file.bin' TBLPROPERTIES ('cobol.layout.url'='hdfs:///path/to/file.cbl');"
I´m using beeline and HDP 2.3.
Cheers, Rafael
Hello,
I got the same same error while executing the script in the beeline cli and got to know if the beeline is open for a long time it seems to occur, I would suggest log out and close the window and reopen, most probably the issue should resolve, let me know how it goes, thanks.
But, how did you resolved the earlier issue of DDL ?
i logged out of hive and logged in..now i am getting below error when i am try to select * from table;
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 56 > 54
The file is a variable block and has comp3 values in it.
Thanks!
Hello Ram,
Did you got a chance to look into this issue?
Hi,
When i drop and recreate the table in hive cli, it is showing new ddl. but in hue its still showing the old ddl of the table. Any suggestions on how to resolve this issue. Thanks!
Thanks
Select * from table fetching zero records from the hive table. Anyone know how to fix this issue or what could be the possible issue?.
Hi rbheemana,
I have tried your tool and everything works. Creating table and reading data is no problem. Unfortunately I have ran into a very strange issue, when I create external table A that points to copybook A, that works fine. Then I create another external table B pointing to another copybook B in a different location, the original table A structure has been replace with table B. Now Table A and B is having the same structure. Is this a problem with Hive or the CobolSerde.Jar itself ?
I have seen the same problem above is faced by njagadam, Some recommendation by manichinnari555 to exit the cli and relogon or set cobol.hive.mapping="", both approached doesn't seems to work. Any clue ? Your help is very much appreciated. You can email me at josephlim.intl@yahoo.com if you happen to know the problem
Pls. post your hive sql for reproduce
The below is what I used to create the table tatrxn. When I tried create another table ctptrxn, with total different structure, it just overwrite the tatrxn.
CREATE EXTERNAL TABLE tatrxn ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/raw/tatrxn/data' TBLPROPERTIES ('cobol.layout.url'='/raw/tatrxn/copybook/tatrxn.cbl','fb.length'='87');
I have tried this, also no luck.
CREATE EXTERNAL TABLE tatrxn ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/raw/tatrxn/data' TBLPROPERTIES ('cobol.layout.url'='/raw/tatrxn/copybook/tatrxn.cbl','cobol.hive.mapping'='','fb.length'='87');
The ctptrxn DDL is as follows
CREATE EXTERNAL TABLE tatrxn ROW FORMAT SERDE 'com.savy3.hadoop.hive.serde3.cobol.CobolSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/raw/ctptrxn/data' TBLPROPERTIES ('cobol.layout.url'='/raw/ctptrxn/copybook/ctptrxn.cbl','fb.length'='180');
I have tried issue set cobol.hive.mapping="" before calling create external table, also no luck.
Sample output from my putty
hive> desc ctptrxn;
OK
firstname varchar(19) from deserializer
lastname varchar(19) from deserializer
age varchar(3) from deserializer
cardnumber varchar(25) from deserializer
trxntype varchar(3) from deserializer
trxnamount varchar(10) from deserializer
trxndate varchar(8) from deserializer
Time taken: 0.522 seconds, Fetched: 7 row(s)
hive> desc tatrxn;
OK
firstname varchar(19) from deserializer
lastname varchar(19) from deserializer
age varchar(3) from deserializer
cardnumber varchar(25) from deserializer
trxntype varchar(3) from deserializer
trxnamount varchar(10) from deserializer
trxndate varchar(8) from deserializer
Time taken: 0.382 seconds, Fetched: 7 row(s)
I am running the HDP 2.4 sandbox that comes with Hive 1.2 version if I am not mistaken
In case you need my copybook, here is tatrxn.cbl
01 TRXN-RECORD. 05 FIRSTNAME PIC X(19). 05 LASTNAME PIC X(19). 05 AGE PIC X(3). 05 CARDNUMBER PIC X(25). 05 TRXNTYPE PIC X(3). 05 TRXNAMOUNT PIC X(10). 05 TRXNDATE PIC X(8).
You can use any other different copybook to simulate and see if it overwrites TATRXN structure
Hello,
Try existing and re-logging in and see.
Hi Mani,
I have already tried, I have even went to the level of restarting my HIVE service before creating the 2nd table. The strange thing is that, when you issue the 2nd table DDL and describe the table, it is still using the 1st table DDL. When you restart the HIVE service, then both 1st and 2nd table is using the latest DDL. I know it looks kind of strange, but this is what I am experiencing. Do you or anyone have this problem at all ?
Yes, I had similar issue.
are you guys using metadata store on derby or MYSQL like database?
Hello,
Are you using HUE or cli to create the tables?
@Will: I am not sure, but I don't see MySQL service is running from HDP VM, therefore, I assume it is derby.
@Mani: As I have responded earlier with my putty output, it is HIVE CLI, not HUE nor BEELINE. As you have stated you have the same problem, do you able to resolve by using HUE for the table creation ?
I've worked on Postgres hosting for the last 6 years though, so maybe that's the issue.
@Will: Just to confirm again, so you are using Postgres as your HIVE storage, and you no longer having this problem ? are you using HIVE CLI ? or HUE to create though ? Before i go ahead and configure the metastore storage to Postgres, I just want to try out if HUE makes a different
PostgreSQL should be good
Thanks, Dayong
I am getting the below error when I am trying to run this code:
Logging initialized using configuration in jar:file:/C:/hadoop/hive-0.12.0.2.0.8.0-1639/lib/hive-common-0.12.0.2.0.8.0-1 639.jar!/hive-log4j.properties Added C:\Users\HDIinyvikg\AppData\Local\Temp\6\a6d1b94f-5402-4804-8784-f85c453a48dd_resources\CobolSerde.jar to class pa th Added resource: C:\Users\HDIinyvikg\AppData\Local\Temp\6\a6d1b94f-5402-4804-8784-f85c453a48dd_resources\CobolSerde.jar FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Class not found: org.apache.hadoop.m apred.FixedLengthInputFormat
Can you please help us resolving this issue. I am reachable at g_vickie84@yahoo.co.in
Hi Guys,
I have tested with Postgres, seems no luck. I am not sure what am I missing. I have connected to the hive metastore to look at the COLUMNS_V2 table to confirm if the columns for the table are generated as duplicate and it is indeed duplicates.
Btw, my previous metastore was MySQL and not Derby, it is default from Hortonworks Sandbox. I really do not know if this is something to do with my environment or the JAR itself.
@Will: Do you have a sample of your Create Table DDL that works that you can share ? you can send to me at josephlim.intl@yahoo.com
Hi All,
@Mani: It works for the way that you have suggested, which is exit the CLI and enter again into the hive CLI. This way works. I think somehow the CLI initialize the hive session everything it initiate. But this initialization only works in the hive CLI world.
The problem is more on using HUE, ZEPPELIN, AMBARI HIVE Views and all sort of the tools. When you first start using any of this tool, and let say you query Table A, somehow this is always cache. Therefore, when I describe Table B, it still give me Table A structure. When I restart the HIVE service, this time, I describe Table B for the first time, it shows me Table B structure. The moment I describe Table A structure, it is showing Table B structure.
Is there a way from the tools or any configuration that it reinitialize or reset the cache ? Is there a way I could clear the cache before issuing a query. Example:
set clear;
desc TableA;
or
desc TableB
set clear;
select * from TableA;
I have tested with below two tables. They all work fine. Note, ADD statement is only ok for each hive session. In order to use it globally, you need either add it manually where you query the table or add it globally, refer here https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cm_mc_hive_udf.html https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cm_mc_hive_udf.html
ADD JAR /home/vagrant/mainframe-serde-0.0.1-SNAPSHOT.jar; CREATE EXTERNAL TABLE test1 ROW FORMAT SERDE ‘your serve class' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/data/test1' TBLPROPERTIES ('cobol.layout.url'='/cpbk/test1/CTRL_DATA.COPYBOOK.TXT','fb.length'='68');
ADD JAR /home/vagrant/mainframe-serde-0.0.1-SNAPSHOT.jar; CREATE EXTERNAL TABLE test2 ROW FORMAT SERDE ‘your serve class' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.FixedLengthInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION '/data/test2' TBLPROPERTIES ('cobol.layout.url'='/cpbk/test2/CTRL_DATA.COPYBOOK.TXT','fb.length'='68’);
Hi All, First things first. Sorry for delay in response, got busy with bread producing work :-) I read all the issues and comments. And replicated the problem myself. I tried to summarize the issue under https://github.com/rbheemana/Cobol-to-Hive/issues/12.
Please have a look and feel free to suggest me with details to update the description.
After a sleepless night, finally figured out the issue and updated the code. I have also updated the jar, please try.
I have also updated the column comments with corresponding copy book field, offset and its length.
Special thanks to @willddy , @josephlim75, @manichinnari555 and @njagadam for identifying and trying out the alternate approaches to resolve. It really helped in the analysis.
Thank you all for the patience. Cheers!!
seems the code is not complete. There is no part to decode from ebcdic to ascii.