owensgroup / BGHT

BGHT: High-performance static GPU hash tables.
https://owensgroup.github.io/BGHT/
Apache License 2.0
55 stars 8 forks source link

Insertion with duplicate keys #7

Open mfbalin opened 2 years ago

mfbalin commented 2 years ago

I am wondering what kind of behavior this hash-table has when duplicate keys exist in the inserted set of pairs. The kind of behavior I am looking for is to sum up the values of the duplicate keys after insertion.

maawad commented 2 years ago

Hi @mfbalin, The current insert and find implementations and tests of the three bucketed hash tables (cuckoo, power-of-two, and iceberg) assume that the keys are unique. If you insert duplicate keys, the behavior you will get will be like std::multimap's behavior. All the duplicate keys will be inserted (no reduction), and find will return the first key that we will find (which could be any of the duplicates)

The reduction behavior that you are looking for will be inefficient to implement during insertion using the cuckoo hashing probing scheme, but instead, perhaps we can have a find-all with a reduction operator (e.g., sum). Reduction during find will be something similar to a std::multimap::equal_range. Reduction during insertion for other probing schemes is possible.

If you are interested in any of these solutions, let me know, and I'll create another issue for it. You are welcome to implement and propose any addition and do a PR if you are interested.

mfbalin commented 2 years ago

Hi @maawad, thanks for your reply. Is there a way to create a reduced hash table after the insertion with the duplicate keys is done? Could potentially be implemented with a pass over the whole hashtable and inserting the unique keys to either a new one or the current hashtable. For my use case, if each query is like an equal_range query, I am afraid it will slow down a lot.

maawad commented 2 years ago

If we have already inserted the keys with duplicates and then tried to perform the reduction, we will be using something like equal_range to uniquify the keys. It won't be exactly the same as I suggest reducing in place. The API will look something like this:

mapped_type find_all_and_reduce(key_type const& key, tile_type const& tile, BinaryOperation op = std::plus<>());

This would be an efficient solution for specific workloads. If you would like to share some information about the workload (e.g., key and value types, duplicate keys ratio expected, how many times we will perform finds, space requirements, etc.), then I will be able to suggest more suitable ideas.

mfbalin commented 2 years ago

My workload involves the insertion of a lot of these key value pairs, where a pair can be <int, float> or <int, int>, after which the hashtable will stay static and only queries to get the reduced value will be made. My current approach uses sorting and compacting but I believe a hashtable approach will be much faster, especially when it comes to queries. Binary search is quite slow for my use case for the queries.

mfbalin commented 2 years ago

Or I could insert the pairs after the compaction has been made using sorting, but I think an efficiently implemented find_all_and_reduce should be much faster. I believe that a very efficient SpMV kernel can be written if we have such a functionality, where the vertex id's are not necessarily contiguous.

maawad commented 2 years ago

Yeah, I think if you are not wasting that much space on duplicates, then storing them should be okay. Any optimization will depend on the workload :)

The performance of the find and reduce will be the same as queries that do not exist in the hash table. The cost is only three reads for any single find. The implementation will look like this:

template <class Key,
          class T,
          class Hash,
          class KeyEqual,
          cuda::thread_scope Scope,
          class Allocator,
          int B>
template <typename tile_type>
__device__ bght::bcht<Key, T, Hash, KeyEqual, Scope, Allocator, B>::mapped_type
bght::bcht<Key, T, Hash, KeyEqual, Scope, Allocator, B>::find_all_and_reduce(key_type const& key,
                                                              tile_type const& tile,
                                                              T init,
                                                              BinaryOperation op) {
  const int num_hfs = 3;
  auto bucket_id = hf0_(key) % num_buckets_;
  value_type sentinel_pair{sentinel_key_, sentinel_value_};
  using bucket_type = detail::bucket<atomic_pair_type, value_type, tile_type>;
  mapped_type result = init;
  for (int hf = 0; hf < num_hfs; hf++) {
    bucket_type cur_bucket(&d_table_[bucket_id * bucket_size], tile);
    cur_bucket.load(cuda::memory_order_relaxed);
    int key_location = cur_bucket.find_key_location(key, key_equal{});
    if (key_location != -1) {
      auto found_value = cur_bucket.get_value_from_lane(key_location);
      result = op(found_value, result);
    } else if (cur_bucket.compute_load(sentinel_pair) < bucket_size) {
      return result;
    } else {
      bucket_id = hf == 0 ? hf1_(key) % num_buckets_ : hf2_(key) % num_buckets_;
    }
  }
  return result;
}