Closed aldanor closed 10 months ago
Actually, it looks like it's partially my misunderstanding, and it seems like you have to scan through the entire data first to build the dictionary array, and then do the second run to actually write it.
But then, again, the same question remains:
MutableDictionaryArray
(or DictionaryArray
) with the full key map which is guaranteed to cover all of values.
If you have a
MutableDictionaryArray
which you populate and flush once in a while and then create a new one for the next chunks, on the second chunk you try to write you will get:The problem is:
invariant:
keys.len() <= values.len()`; with this invariant, how do you solve the above problem? If you were to fix it and allow sharing key maps, you will inevitably end up with an empty array but non-empty key list. Does this invariant actually break anything anywhere if it's violated?Am I missing something or is there a way to do it? (will be glad to open a PR if there's any suggestions on what's the proper way to fix it)
Might be somewhat related: https://github.com/jorgecarleitao/arrow2/issues/1485