mochi-hpc / mochi-ssg

Scalable Service Groups (SSG), a group membership service for Mochi
Other
1 stars 1 forks source link

faster load - [merged] #40

Closed shanedsnyder closed 3 years ago

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Oct 26, 2020, 13:52

Merges dev-faster-load -> master

in mpi programs i'm seeing a high cost in a routine that calls ssg_group_id_load and ssg_group_id_serialize (https://xgitlab.cels.anl.gov/sds/benvolio/blob/master/src/client/bv-client.cc#L592)

I thought maybe the stat call was a culprit, so here's a small modification that doesn't call stat. Couple problems here prompting the 'WIP' tag

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Oct 26, 2020, 14:45

added 8 commits

Compare with previous version

shanedsnyder commented 3 years ago

In GitLab by @carns on Oct 26, 2020, 15:12

Neat. For a longer term solution maybe just have a loop that calls realloc() if it gets to the end of the buffer without hitting EOF?

shanedsnyder commented 3 years ago

In GitLab by @shanedsnyder on Oct 27, 2020, 13:52

Yeah, I hadn't considered how expensive collective stat might be at larger scales, but it does make sense to avoid it. I think Phil's realloc suggestion is sufficient, so that you can resize your fixed buffer depending on what all the group file contains.

You need to make sure you are passing a complete buffer to the deserialize routine -- I just looked briefly, but it looks like that code is going to quietly bomb if you give it a partial buffer.

There is an embedded count of the number of member addresses contained in the group file that is used to help deserialize. I don't think any magic footers are needed, because the only state we keep per-member is their address string which is null-terminated.

As a little bit of an aside, the load/deserialize APIs both allow you to extract a subset of the address contained in the group file, but this is done as part of deserialize step (i.e., it extracts a subset from a buffer, not from a file).

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 3, 2020, 17:07

added 1 commit

Compare with previous version

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 3, 2020, 17:12

Something like this? I expect a too-small buffer will be uncommon and yet i am going to end up calling realloc every time.

shanedsnyder commented 3 years ago

In GitLab by @carns on Nov 5, 2020, 14:43

Commented on src/ssg.c line 1639

this should be a read() instead of a pread(), right? or else increment the offset off of 0?

shanedsnyder commented 3 years ago

In GitLab by @carns on Nov 5, 2020, 14:45

Commented on src/ssg.c line 1640

something is a little odd about bytes_read, I think it is supposed to accumulate for this logic and the deserialization below to work? Might be good to do a test run with the initial size set to something deliberately way to small (2 bytes or something) to make sure it processes right when it hits the realloc() case.

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 5, 2020, 15:46

added 1 commit

Compare with previous version

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 5, 2020, 16:05

you're right Phil that first attempt was not right, but I was trying to avoid maintaining state -- just keep re-reading the file with a larger and larger buffer until it worked (pread with offset of 0 to avoid a seek).

I pushed a better version that does not do the extraneous malloc, the repeated re-reading, and it's been tested on itty bitty files and great big files.

Did you know the ideal growth ratio is 1.5? It has something to do with the Fibonacci sequence and the golden ratio... and is not at all relevant to this discussion but I had to share the results of that wormhole I fell into with someone.

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 5, 2020, 16:08

Commented on src/ssg.c line 1639

changed this line in version 5 of the diff

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 5, 2020, 16:08

added 1 commit

Compare with previous version

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 5, 2020, 16:10

added 1 commit

Compare with previous version

shanedsnyder commented 3 years ago

In GitLab by @roblatham00 on Nov 5, 2020, 16:17

I also want to add that in the end the "routine that called ssg_load" ended up spending its time somewhere else, once I added more timing calls, so this optimization is not going to make a huge difference

shanedsnyder commented 3 years ago

In GitLab by @carns on Nov 6, 2020, 07:54

Oh, duh, it didn't occur to me that it was re-reading. It looked like a bug because I was thinking of it continuing at the last offset.

Anyhow, this looks good to me now, but I'll defer to Shane for the merge.

shanedsnyder commented 3 years ago

In GitLab by @shanedsnyder on Nov 6, 2020, 12:36

New version looks good to me, too. I'll go ahead and merge it in. Thanks @roblatham00!

shanedsnyder commented 3 years ago

In GitLab by @shanedsnyder on Nov 6, 2020, 12:37

unmarked as a Work In Progress

shanedsnyder commented 3 years ago

In GitLab by @shanedsnyder on Nov 6, 2020, 12:37

merged

shanedsnyder commented 3 years ago

In GitLab by @shanedsnyder on Nov 6, 2020, 12:37

mentioned in commit c4c00a44dbb1468621be67a25de1129fdf95aaf4