Snapshots in response to PART and QUIT make PircBotX CPU time quadratic

GoogleCodeExporter commented 9 years ago

Both PART and QUIT commands trigger UserChannelDao.createSnapshot(), which 
copies the entire user map, which takes time linear in the number of users on 
the server.

If you assume that the rate of PART and QUIT commands is proportional to the 
number of users on a server, then the CPU time spent on creating snapshots is 
quadratic in the number of users on the server.

I think that this is already a problem, but to make things worse, there is a 
lot of CPU time consumed by User.createSnapshot(). A rough test on my Desktop 
shows that UUID.random() takes about a third of a microsecond. This means that 
there cannot be more than (around) three million (quit+part)*users per second. 
Additionally, an insane amount of objects are allocated in this projects and 
never used.

I was able to solve this for my use case by extending InputParser and skipping 
UserChannelDao.createSnapshot(). However, I think this problem should be 
adressed.

These snapshots are at the very least questionable. I don't understand their 
purpose, since the set* methods, which unconditionally throw RuntimeException 
in the snapshot version, are only called from InputParser.

If there is in fact a purpose for these snapshot objects, then the snapshot of 
a User object does only have to be created once. I think that the getters of 
the snapshot object should forward to the parent object instead of copying the 
information.

Furthermore, wouldn't a UserSnapshot have to have the same UUID as the original 
User object? What's the purpose of the UUID if it changes?

You can also create the UUID lazily like this:
@Getter(lazy=true) private final UUID userId = UUID.randomUUID();

By the way, User.hashCode() and User.equals(Object) are inconsistent as they 
currently are in the repository :)

Original issue reported on code.google.com by tillmann...@gmail.com on 11 Feb 2015 at 1:07

GoogleCodeExporter commented 9 years ago

Snapshots were created because when a user parts or quits, or the bot parts or 
quits, PircBotX removes Channel or User objects from the DAO. I implemented it 
after I got several bug reports about QuitEvent, PartEvent, and DisconnectEvent 
being useless since there was very little remaining information 

The latest build has some bugfixes in createSnapshot that were affecting 
performance and also introduces configBuilder.setSnapshotsEnabled() for rare 
situations when performance is absolutely critical and snapshots aren't needed

However you're over estimating the impact of this. Some unofficial, quick 
benchmarks with snapshots enabled still peak at around 2000 lines per second, 
and that's with 375,000 parts, quits, and kicks included in the ~3,000,000 line 
benchmark. That's an insanely high throughput of quits and parts that nobody is 
going to see.

Original comment by Lord.Qua...@gmail.com on 11 Feb 2015 at 10:37

GoogleCodeExporter commented 9 years ago

Very nice that they can be disabled through the configuration in the future, 
thank you!

I still think that snapshots are overkill for part and quit (if a different 
user than the bot leaves). The leaving user's information could be given in a 
separate data structure. The information about the other users in the snapshot 
will always be potentially outdated if you handle events in different threads, 
so theres IMO no need to create a snapshot.

I don't think I'm overestimating the impact of this. The point here is that the 
CPU time grows *quadratically* with the number of users on the server, so it's 
not noticeable unless there are a lot of users. I attached a simple test, which 
I made in InputParserTest. This test puts 10000 users in a channel and then 
repeats one guy joining and parting from the channel 1000 times.

If you disable adding things to the eventQueue of EventQueueListener (otherwise 
every snapshot stays in memory), the second part of the test takes 18 seconds 
on my computer (with about 0.1 seconds for garbage collection), so that's 18 
milliseconds per part event. That is huge.

In my case, there are more than 10000 users in a channel, so this is not a 
hypothetical test.

To draw the parallel to your benchmark: an eighth of your events are 
parts/quits, so with 10000 users, you can handle around 56 of these per second. 
Suddenly you drop to 448 lines per second.

Again, I want to stress that a lot of the CPU time here is consumed by 
UUID.random(). Simply creating a random UUID via new 
UUID(Double.doubleToLongBits(Math.random()), 
Double.doubleToLongBits(Math.random())) is about seven times faster. 
UUID.random() is cryptographically strongly generated, which in this case is a 
pure waste of time. I'm sure that there are even faster ways to generate a 
random UUID. Having userId as a constructor argument to User (and reusing the 
UUID in the snapshot, which as I stated earlier seems like the correct way) 
brings the test down to under 14 seconds.

Original comment by tillmann...@gmail.com on 13 Feb 2015 at 10:36

Attachments:

test.txt

GoogleCodeExporter commented 9 years ago

Thought about what you said overnight, regenerating UUID's for snapshots is 
actually a bug. Fixed that in Revision 72d3ed696351. I also used HashMap when 
ImmutableMap wasn't needed in Revision 0798c7ccae58. 

New results
My benchmark with 2,850,000 lines - Before 950 lines/second, After 1550 
lines/second, 38% improvement
Your benchmark - Before 32.64 seconds, After 11.96 seconds, 63% improvement 
(!!!)

So thank you! I really didn't know UUID was that slow.

For fun I tried benchmarking without snapshots enabled. Mine did ~128,000 
lines/second. Yours ran in 63.41 milliseconds(!) or 15,770 lines/second

If you have any other optimizations I'd like to hear them. Moving to 
per-channel and per-user collections would significantly increase the number of 
collections that need to be copied. Only snapshotting the relevant user or 
channel would break most of the API (event.getChannel().getUsers(), 
event.getUser().getChannels(), etc) which is confusing and unacceptable. I 
haven't found anything else I can tweak

Understand though, more than 100 lines/second is definitely not typical. 1000 
users parting is not typical. Arguably 10,000+ users is not typical. I'm 
sacrificing a small moment of performance to give the 98% of users, who only 
see a few thousand IRC users and regular levels of throughput, a nice feature. 
For the remaining percentage, disabling is now an available option.

Original comment by Lord.Qua...@gmail.com on 14 Feb 2015 at 4:16

GoogleCodeExporter commented 9 years ago

I think a fairly elegant solution would be to create the snapshotted objects 
lazily and just add the user who quit / channel that the user parted from to 
the snapshotted objects. I attached LazyUserChannelDaoSnapshot, which 
demonstrates what I mean. I implemented a couple of methods, although I didn't 
get into the "normal" user and level things. UserChannelDaoInterface is the 
canonical interface extracted from UserChannelDao.

By the way, in UserChannelDao, you're synchronizing every method including the 
call to getAllUsers. This has the potential to cost a lot of multithreading 
performance and, more importantly, block the input thread. I attached yet 
another test (joinBeatDown.txt, which goes in InputParserTest), which joins 
10000 users and calls Channel.getUsers() in the JoinEvent. This takes almost 30 
seconds.

I'd recommend using concurrent data structures instead. ConcurrentSkipListMap 
is an excellent SortedMap implementation, which allows you to modify the map 
while you're iterating over it while staying weakly consistent (and not causing 
ConcurrentModificationException). You can use ConcurrentSkipListSet as the 
value collection in Multimaps.

I would then not return copies of the data structures in UserChannelDao, but 
instead wrap the live collections with Collections.unmodifiableCollection, and 
document that the structure is sorted and can change over time. The user can 
create a copy of the collection if they need to consistently iterate over it 
several times.

You're correct, it is important to stay on task with your library, but in my 
opinion, a library should never have any hidden costs (like copying data 
structures) unless the user specifies extraordinary requirements or the task 
obviously requires it. Sure, if your library is the topmost application in the 
stack, it'll seem fast in most cases, but add a couple of layers and soon 
enough somebody is raging because their program freezes when they open a 
certain drop-down list ^^

Original comment by tillmann...@gmail.com on 14 Feb 2015 at 10:25

Attachments:

GoogleCodeExporter commented 9 years ago

Lazy loading the snapshot isn't going to work since the information in 
UserChannelDao was already deleted. Eg bot leaves channel, calls 
channel.getUsers() to know who to remove from the GUI, that calls 
dao.getUsers(Channel) which fails since the users and the channel no longer 
exist. That's why there's snapshots: To save that information before it 
happens. 

With synchronizing and concurrent modifications, that is a very tricky beast. 
Unfortunately there is a lot of non-atomic, internal state in UserChannelDao. 
UserChannelMap (a many-to-many implementation) uses 2 Guava HashMultimaps which 
are fast but will throw ConcurrentModificationException's. mainMap and the 5 
UserLevel maps are powered by that. There's also the cached channel-name and 
user-nick maps and the privateUser map which are thankfully plain Map's

If you remove all the synchronizing, all those maps will fall out of sync. 
Guava's Multimaps are also not thread safe. The UserChannelDao methods could 
fail in very un-obvious ways. I'm really afraid of replacing a sacrifice in 
performance with NPE's and other exceptions. Even switching to 
ConcurrentSkipListMap wouldn't solve that problem. 

I've thought about replacing Multimaps in UserChannelMap with a single List of 
User-Channel pairs, but that still doesn't solve the above issue. Regardless, 
Multimaps are very heavily optimized and operations like 
UserChannelMap.getUsers(Channel) would require iterating the entire map.

If you have a patch that can pass the test suite and multi-threaded benchmarks 
and be faster I would be willing to merge it. 

BTW in your benchmark your hostmask is missing the ! . This will get you the 
same result
inputParser.handleLine(":user" + i + "!~user" + i + "@host.test JOIN 
:#aChannel");

Original comment by Lord.Qua...@gmail.com on 15 Feb 2015 at 12:00

Changed state: Accepted

raisedragon / pircbotx

Snapshots in response to PART and QUIT make PircBotX CPU time quadratic #232