ismailhammounou / db2ixf

db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
https://ismailhammounou.github.io/db2ixf/
GNU Affero General Public License v3.0
17 stars 1 forks source link

KeyError IXFTCCNT #65

Closed bisoldi closed 9 months ago

bisoldi commented 9 months ago

I'm getting a KeyError: IXFTCCNTon line 150 of ixf.py when I call parser.parse_columns(). I'm not sure why there is a line number difference from the main branch, I'm using v0.12.1, which I believe is the latest?

Also, when I call parser.parse_table(), the IXFTCCNT key maps to b' ' and I'm not sure why.

Any ideas?

ismailhammounou commented 9 months ago

Hi @bisoldi ,

Try to look to one of the exemples, you can also check the source code for one of the output formats like parquet or deltalake or json, it will help you to have an idea how IXF is parsed. Yes the v0.12.1 is the latest release, the main branch is the dev branch.

Which format you use ? Can you provide a stack trace so I can help ?

Parsing of IXF files is in this order:

  1. Parse the header
  2. Parse the table
  3. Parse the column descriptors
  4. Parse the data records

If you don't respect this order, your parsing will not complete and will raise errors. If you respect this order and you still face error then please send me the stack trace.

Most of the time you don't need to call parser.parse_table(), parser.parse_columns(). Use only methods that do the conversion to a format or the parsing to python objects like parser.to_json().

We can use them in case we need to convert to a new unsupported format like XML. For now, we only support json, jsonline, csv, parquet and deltalake. If you need a new format then please create an issue and I wil try to help.

Thank for your feedback, BR, Ismail

bisoldi commented 9 months ago

Hi, I started with parser.to_csv() and it worked, but did not print the column headers, so that's when I started trying to parse manually. The parser.to_json() function works just fine and obviously includes the column headers as the JSON object keys/fields, but I want it in CSV and was trying to avoid using JSON as an intermediary step.

Unfortunately, I'm not able to copy/paste the stack trace, but there the only error I have at the moment is the one I provided... KeyError: IXFTCCNT on line 150 of ixf.py. That line is specifically for _ in range(0, int(self.table_info["IXFTCCNT"])):

I just tried it and it turns out, if I call parse_header(), then parse_table(), then parse_columns() does not error out, so your suggestion worked, though I still don't know why the to_csv function did not print the headers.

Finally, in order to get the column headers, it seems I need to iterate over the list of dict returned by parse_columns() and pull out the value associated with the "IXFCNAME" key? Is there a built in function to get just the column headers back?

EDIT 1 Also, the value associated with the "IXFCNAME" seems to be padded significantly with 2 lot of spaces and 2 new lines. Is that coming from the IXF file itself or do I need to do something different?

Thanks!

ismailhammounou commented 9 months ago

Hi @bisoldi ,

Thank you for the feedback. You are right, I ve just have a look on csv output of a test and it seems not to work, I don't see headers too.

In the new release 0.13.0, I added get_names function to get names of the columns. from db2ixf.helpers import get_names pass parse_columns() result to the get_names function. But I don't think It will work, I need to check. I think It is coming from source so maybe I need to strip the names column_name.strip() ans see.

Thank you again for the feedback. I will try to check and see what I can do for you.

Edit: I know how to fix it, it is an error I made. Please wait for the 0.13.1.

BR, Ismail

ismailhammounou commented 9 months ago

Hi @bisoldi ,

I think I fixed the problem (Checkout the release 0.13.1), you can now convert to csv without any issues. You don't need to use parse_columns or parse_table, you can use parser.to_csv() and it should now work. In case you still have issues please tell me and give more details about the error.

Thank for the feedback again. BR, Ismail