Closed GoogleCodeExporter closed 9 years ago
Thanks for taking this on! The part that had always stymied me before is, how
does one store the empty key? But I guess we don't need to; we can just store
a bitmap of used buckets.
I'm confused how you do that actually (it may be just hard for me to tell
because I'm trying to read the raw patch). When reading a hashtable back in,
how do you tell which elements are empty and which aren't?
It looks like you took your read32_or_64 routine from sparse_hash_map? I
appreciate the code sharing, but since dense_hash_map doesn't have the legacy
problem of 32 bits, we don't need to use the fancy encoding in this case. We
can just always write 64 bits.
Finally, you need to update the unittest framework so we test this new
functionality. It shouldn't be that difficult -- probably the test is in
hashtable_test.cc. If I remember right, the code that tests reading/writing
already exists, and has a special case to say to skip it for dense_hash_*,
which can just be removed.
I just made a new sparsehash release, sadly, but this functionality will make
it into the next release.
Original comment by csilv...@gmail.com
on 31 Jan 2011 at 10:14
The code you wrote sorta-works, if the user always writes nonpointer data and
always uses the same empty-key when reading the hashtable as when writing it.
The latter is probably pretty reasonable to require, though I went through
efforts not to require it for sparse_hash_map, but I don't think we can commit
this until the former is addressed. How do people read data from disk into the
hashtable after calling read_metadata(), if their data isn't suitable for
read_nonpointer_data()? They need to be able to distinguish empty from
non-empty buckets, and there's no way to do that based on what read_metadata()
does.
The only solution is to have write_metadata() write a bitmap of empty buckets,
and read_metadata() to read it. But then we'll need to store that data
somewhere for the nonempty_begin/end to get to it, which bulks up the
dense_hashtable data structure unnecessarily (we've been trying to keep the
size down). It also either slows down normal iterators, who need to check this
bitmap on the off chance they're executing after a read_metadata(), or else we
need to introduce a whole new style of iterators just for reading after the
metadata. Neither solution is ideal.
I've been unhappy with the write_metadata/write_nonpointer_data approach for a
while. I'm thinking of just creating a new API for reading/writing data, that
takes a functor for writing/reading individual elements. That way, we don't
have to separate out the metadata from actual data, and thus avoid the
bitmap-storing problem.
Original comment by csilv...@gmail.com
on 16 Feb 2011 at 4:33
This is now in SVN. We went with a new, functor-based API. I'll be adding
that API for the sparsehash case shortly.
Original comment by csilv...@gmail.com
on 7 Dec 2011 at 8:16
This is done in sparsehash 1.12, just released.
Original comment by csilv...@gmail.com
on 21 Dec 2011 at 5:27
Original issue reported on code.google.com by
dimensio...@gmail.com
on 29 Jan 2011 at 9:14Attachments: