simulationcraft / simc

Simulationcraft engine/GUI
GNU General Public License v3.0
1.32k stars 679 forks source link

[dbc_extract] Array of Strings in db2 file causes IndexError: list index out of range #4602

Closed albuch closed 5 years ago

albuch commented 5 years ago

I'm trying to extract data from a db2 file (BattlePetEffectProperties.db2) that has an array of strings as one column. See https://github.com/wowdev/WoWDBDefs/blob/master/definitions/BattlePetEffectProperties.dbd for table definition.

When extracting the data with dbc_extraxt.py as a csv the following error is shown:

[2018-11-24 05:59:11] DEBUG: Opened format file formats\8.0.1.28153.json
[2018-11-24 05:59:11] DEBUG: WDB file found at F:\Development\WOW\gamedata\8.0.1.28153\DBFilesClient\BattlePetEffectProperties.db2
[2018-11-24 05:59:11] DEBUG: Opened format file formats\8.0.1.28153.json
[2018-11-24 05:59:11] DEBUG: Parsed basic column info
[2018-11-24 05:59:11] DEBUG: Computed offsets, column_infos=120, column_datas=192, sparse_datas=296
[2018-11-24 05:59:11] DEBUG: Computed offsets for section=0, records=296, offset_map=0, strings=2144, ids=3008, clones=3272, keys=0
[2018-11-24 05:59:11] DEBUG: Parsed extended column info block
[2018-11-24 05:59:11] DEBUG: Parsed id block for section 0 (66 records)
[2018-11-24 05:59:11] DEBUG: Parsed clone block for section 0 (52 clones)
[2018-11-24 05:59:11] DEBUG: BattlePetEffectProperties.db2 column data for Field1 : byte_offset=24  type=index  (int16)    bit_offset=192 packed_bit_offset=0   block_size=32      at base offset 192
[2018-11-24 05:59:11] DEBUG: BattlePetEffectProperties.db2 column data for Field2 : byte_offset=24  type=array  (int8)     bit_offset=196 packed_bit_offset=4   elements=6  block_size=72      at base o
ffset 224
[2018-11-24 05:59:11] DEBUG: Unpacking plan for BattlePetEffectProperties.db2: string32, int16, int8
[2018-11-24 05:59:11] DEBUG: Opened format file formats\8.0.1.28153.json
[2018-11-24 05:59:11] DEBUG: BattlePetEffectProperties.db2 { magic=WDC2, byte_size=3688, records=66 (118), fields=3 (3), sz_record=28, sz_string_block=864, table_hash=0x63b4c4ba, layout_hash=0xa2d4adf
5, first_id=22, last_id=359, locale=0xffffffff, flags=0x0014, id_index=0, total_fields=3, ptr_record_packed_data=24, wdc2_unk1=0, sz_column_info_block=72, sz_sparse_block=0, sz_column_data_block=104,
sections=1 }
Field0 : byte_offset=0   type=bytes  (string32) bit_offset=0
Field1 : byte_offset=24  type=index  (int16)    bit_offset=192 packed_bit_offset=0   block_size=32
Field2 : byte_offset=24  type=array  (int8)     bit_offset=196 packed_bit_offset=4   elements=6  block_size=72
Section0: key_id=0x0000000000000000, ptr_records=296, total_records=66, sz_string_block=864, sz_clone_block=416, ptr_offset_map=0, sz_id_block=264, sz_key_block=0
Traceback (most recent call last):
  File "dbc_extract.py", line 308, in <module>
    print('{}'.format(record.csv(options.delim, first)))
  File "F:\Development\WOW\simc\dbc_extract3\dbc\data.py", line 357, in csv
    s += '"%s"%c' % (self._dbcp.get_string(self._d[i], self._id, i).replace('"', '\\"'), delim)
  File "F:\Development\WOW\simc\dbc_extract3\dbc\wdc2.py", line 374, in get_string
    start_offset = self.get_string_offset(raw_offset, dbc_id, field_index)
  File "F:\Development\WOW\simc\dbc_extract3\dbc\wdc2.py", line 354, in get_string_offset
    column = self.column(field_index)
  File "F:\Development\WOW\simc\dbc_extract3\dbc\wdc1.py", line 892, in column
    return self.column_info[idx]
IndexError: list index out of range

As you can see form the Field0 definition the elements property of the column format (see below for column definition) is missing.

I'd be happy to contribute a PR but failed to find out where the root cause lies as the program is not very well structured and easy to read. So any hint where to look would be appreciated.

Expected behavior

dbc_extract.py doesn't through an exception and extracts the data.

To Reproduce

  1. Download latest dbc files with casc
  2. Add Column definition to formats file:
    "BattlePetEffectProperties": [
      { "data_type": "S", "field": "ParamLabel", "elements": 6},
      { "data_type": "h", "field": "BattlePetVisualID"},
      { "data_type": "b", "field": "ParamTypeEnum", "elements": 6}
    ],
  3. Run dbc_extract.py -b ${WOW_VERSION} -t csv -p ${PATH_TO_DBFILESCLIENT} BattlePetEffectProperties > BattlePetEffectProperties.csv

Additional information

navv1234 commented 5 years ago

That's be cause the system cannot properly determine that it is an array of strings in the first field. The header is essentially saying that it is a 24byte field.

albuch commented 5 years ago

Which is correct as it's number of elements * 4 bytes for each location of the string in the file, no?

navv1234 commented 5 years ago

Yes

albuch commented 5 years ago

Please let me know if I get this right:

for a WDC2 file the field_size for the offset of a regular string is 32 (= 4 byte) and in this case its 192 (= 24 byte). So the parser could find out that it's an array of offsets rather than for a single string. But it currently doesn't. I wouldn't expect the extended column data (I guess that's what you are referring as header) to be any different than what it is now as all other entries define other value types (indexed, arrays with fixed length types etc.).

I'm not sure about how inline strings from hotfixes would work though as I don't have an example at hand to check the extended column info.

navv1234 commented 5 years ago

The parser understands it is an array, and the data is parsed correctly. The fact that the debug output does not see it as an array is just a display issue on the debug output.

The issue preventing it from working is that WDC2 requires a new kind of string offset handling (compared to WDC1), and that process does not currently take into account arrayed string fields.

I have a fix locally, I'll push it later today or tomorrow, likely.

navv1234 commented 5 years ago

Should be fixed in 53a956884003c3236a57e21eba068ca3d9bfed7f

albuch commented 5 years ago

I can confirm that the fix solves the issue. Thanks!