pip010 / sparsehash

Automatically exported from code.google.com/p/sparsehash
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Implementation of input/output routines for dense_hash_map #65

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a dense_hash_map
2. Write the content on a file with write_metadata & write_nopointer_data
3. Try to read from the file (with read_metadata & read_nopointer_data)

What is the expected output? What do you see instead?
The input/output routines for dense_hash_map have not yet been implemented, so 
the dense_hash_map has no elements.

Please use labels and text to provide additional information.
I implemented the input/output routines (based on sparsetable) and now all the 
routines are working properly.

I attached the patch (for google/sparsehash/densehashtable.h)

:)

Original issue reported on code.google.com by dimensio...@gmail.com on 29 Jan 2011 at 9:14

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for taking this on!  The part that had always stymied me before is, how 
does one store the empty key?  But I guess we don't need to; we can just store 
a bitmap of used buckets.

I'm confused how you do that actually (it may be just hard for me to tell 
because I'm trying to read the raw patch).  When reading a hashtable back in, 
how do you tell which elements are empty and which aren't?

It looks like you took your read32_or_64 routine from sparse_hash_map?  I 
appreciate the code sharing, but since dense_hash_map doesn't have the legacy 
problem of 32 bits, we don't need to use the fancy encoding in this case.  We 
can just always write 64 bits.

Finally, you need to update the unittest framework so we test this new 
functionality.  It shouldn't be that difficult -- probably the test is in 
hashtable_test.cc.  If I remember right, the code that tests reading/writing 
already exists, and has a special case to say to skip it for dense_hash_*, 
which can just be removed.

I just made a new sparsehash release, sadly, but this functionality will make 
it into the next release.

Original comment by csilv...@gmail.com on 31 Jan 2011 at 10:14

GoogleCodeExporter commented 9 years ago
The code you wrote sorta-works, if the user always writes nonpointer data and 
always uses the same empty-key when reading the hashtable as when writing it.  
The latter is probably pretty reasonable to require, though I went through 
efforts not to require it for sparse_hash_map, but I don't think we can commit 
this until the former is addressed.  How do people read data from disk into the 
hashtable after calling read_metadata(), if their data isn't suitable for 
read_nonpointer_data()?  They need to be able to distinguish empty from 
non-empty buckets, and there's no way to do that based on what read_metadata() 
does.

The only solution is to have write_metadata() write a bitmap of empty buckets, 
and read_metadata() to read it.  But then we'll need to store that data 
somewhere for the nonempty_begin/end to get to it, which bulks up the 
dense_hashtable data structure unnecessarily (we've been trying to keep the 
size down).  It also either slows down normal iterators, who need to check this 
bitmap on the off chance they're executing after a read_metadata(), or else we 
need to introduce a whole new style of iterators just for reading after the 
metadata.  Neither solution is ideal.

I've been unhappy with the write_metadata/write_nonpointer_data approach for a 
while.  I'm thinking of just creating a new API for reading/writing data, that 
takes a functor for writing/reading individual elements.  That way, we don't 
have to separate out the metadata from actual data, and thus avoid the 
bitmap-storing problem.

Original comment by csilv...@gmail.com on 16 Feb 2011 at 4:33

GoogleCodeExporter commented 9 years ago
This is now in SVN.  We went with a new, functor-based API.  I'll be adding 
that API for the sparsehash case shortly.

Original comment by csilv...@gmail.com on 7 Dec 2011 at 8:16

GoogleCodeExporter commented 9 years ago
This is done in sparsehash 1.12, just released.

Original comment by csilv...@gmail.com on 21 Dec 2011 at 5:27