Closed GoogleCodeExporter closed 9 years ago
Yes, you are right, iterating through a sparsetable is not technically constant
time per increment, but linear in the number of buckets (more specifically, the
number of sparse-groups). When there are no deleted items in a sparsetable,
the number of buckets is proportional to the number of items, so iterating
through an entire table is amortized constant time. When there are many
deleted items, you don't even get amortized constant time.
The suggestion I normally give is to resize the hashtable after doing a lot of
deletions (or insert something new, which automatically will resize for you) --
then iterating is constant time again.
But I admit this depends on a model where someone iterates through an entire
map, rather than just calling begin() repeatedly. I have to admit, that is an
idiom I haven't seen before. Why do you keep calling begin() instead of
keeping track of the old it and starting the search again from there? Is it
because you're worried erase() invalidates iterators?
That's not true for sparse_hash_map, so you can do a naive erase-loop and have
it work ok. That should avoid the problems you're seeing.
Fixing the code to be faster on the idiom you show here would slow down the
code in the general case. There may be some tweaks I could make, but really I
think the better approach would be to write what you're trying to do in a
different way.
Original comment by csilv...@gmail.com
on 27 Oct 2010 at 7:09
Any more word on this? I'm looking to make a new release very soon, and don't
currently plan on taking any action on this issue before then. Hopefully
slight changes to the application-level code can work around any problems here.
Original comment by csilv...@gmail.com
on 21 Jan 2011 at 12:37
Indeed, I managed to work around this issue. This might indeed be considered
low-priority.
Original comment by jfost...@intec.ugent.be
on 10 May 2011 at 12:10
Thanks, I'll keep the bug report open but lower the priority.
Original comment by csilv...@gmail.com
on 10 May 2011 at 7:22
I've been thinking more about this. If I wanted to keep iterating over a
hash_map to stay amortized constant time, the only way I can think of is to
resize the hashtable at a call to begin(). (Or, more likely, to resize only if
the number of deleted elements is X% of the number of non-deleted elements.)
This complicates the simple rule, "We only resize via resize() or insert(),"
but probably that's ok. (The rule's not so simple, anyway; we also resize()
when saving to disk, for instance.)
The bigger problem is that begin() is (or can be) a const method, so it's not
reasonable to resize then. Even if people used the non-const begin(), they
would probably naively expect it to be safe to do so from multiple threads
without locking. So while appealing, I don't think resizing at begin() is a
workable idea.
The other option is to store extra state in the sparsetable to make it faster
to ignore deleted elements when iterating. That's probably too expensive to do
in general -- at least, I can't think of a good way -- but perhaps not too
expensive to do it just for the special case of matching begin(). It would,
however, increase the size of sparse_hash_map, since we'd need to store at
least one word of data for this new state. Since some applications I know of
use hundreds or thousands of sparse_hash_maps, I hate to increase the class
size without a demonstrated benefit.
Overall, I think this idiom of repeatedly calling begin() is a pretty rare one;
at least, I've never seen it before. So I've decided not to make any code
changes to speed that case up.
I wish there was some way to tell the programmer to put in a call to resize()
after doing lots of delete()s, if they don't plan to follow them up by an
insert(). That would probably help to speed things up more than any
begin()-hacks I do, in any case.
Original comment by csilv...@gmail.com
on 26 Aug 2011 at 1:07
Original issue reported on code.google.com by
jan.fost...@gmail.com
on 27 Oct 2010 at 9:16