heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.95k stars 448 forks source link

bigint-count #637

Open MarcusGDaniels opened 3 years ago

MarcusGDaniels commented 3 years ago

Confused by the meaning of this:

https://docs.omnisci.com/installation-and-configuration/config-parameters

bigint-count [=arg] Use 64-bit count. Disabled by default because 64-bit integer atomics are slow on GPUs. Enable this setting if you see negative values for a count, indicating overflow. In addition, if your data set has more than 4 billion records, you likely need to enable this setting.

The type for bigint says it is 8 byte. Does this mean the storage format is actually 4 byte unless this is asserted? Do I need to reload with this set?

Marcus

alexbaden commented 3 years ago

No need to reload -- bigint count just uses a 64-bit integer for count(*), etc. The default is to use a 32-bit integer in the output slot, as most use cases will not exceed 32-bits and the atomics are much faster. But, if you see overflows, you need to flip bigint-count on.

Doing this automatically likely wouldn't be too difficult, either by detecting the overflow and starting over / storing the heuristic or by using an upfront heuristic to estimate groups size based on number of groups.