Closed shanedsnyder closed 3 years ago
In GitLab by @roblatham00 on Oct 26, 2020, 14:45
added 8 commits
sds:master
In GitLab by @carns on Oct 26, 2020, 15:12
Neat. For a longer term solution maybe just have a loop that calls realloc() if it gets to the end of the buffer without hitting EOF?
In GitLab by @shanedsnyder on Oct 27, 2020, 13:52
Yeah, I hadn't considered how expensive collective stat might be at larger scales, but it does make sense to avoid it. I think Phil's realloc suggestion is sufficient, so that you can resize your fixed buffer depending on what all the group file contains.
You need to make sure you are passing a complete buffer to the deserialize routine -- I just looked briefly, but it looks like that code is going to quietly bomb if you give it a partial buffer.
There is an embedded count of the number of member addresses contained in the group file that is used to help deserialize. I don't think any magic footers are needed, because the only state we keep per-member is their address string which is null-terminated.
As a little bit of an aside, the load/deserialize APIs both allow you to extract a subset of the address contained in the group file, but this is done as part of deserialize step (i.e., it extracts a subset from a buffer, not from a file).
In GitLab by @roblatham00 on Nov 3, 2020, 17:07
added 1 commit
In GitLab by @roblatham00 on Nov 3, 2020, 17:12
Something like this? I expect a too-small buffer will be uncommon and yet i am going to end up calling realloc every time.
In GitLab by @carns on Nov 5, 2020, 14:43
Commented on src/ssg.c line 1639
this should be a read() instead of a pread(), right? or else increment the offset off of 0?
In GitLab by @carns on Nov 5, 2020, 14:45
Commented on src/ssg.c line 1640
something is a little odd about bytes_read, I think it is supposed to accumulate for this logic and the deserialization below to work? Might be good to do a test run with the initial size set to something deliberately way to small (2 bytes or something) to make sure it processes right when it hits the realloc() case.
In GitLab by @roblatham00 on Nov 5, 2020, 15:46
added 1 commit
In GitLab by @roblatham00 on Nov 5, 2020, 16:05
you're right Phil that first attempt was not right, but I was trying to avoid maintaining state -- just keep re-reading the file with a larger and larger buffer until it worked (pread with offset of 0 to avoid a seek).
I pushed a better version that does not do the extraneous malloc, the repeated re-reading, and it's been tested on itty bitty files and great big files.
Did you know the ideal growth ratio is 1.5? It has something to do with the Fibonacci sequence and the golden ratio... and is not at all relevant to this discussion but I had to share the results of that wormhole I fell into with someone.
In GitLab by @roblatham00 on Nov 5, 2020, 16:08
Commented on src/ssg.c line 1639
changed this line in version 5 of the diff
In GitLab by @roblatham00 on Nov 5, 2020, 16:08
added 1 commit
In GitLab by @roblatham00 on Nov 5, 2020, 16:10
added 1 commit
In GitLab by @roblatham00 on Nov 5, 2020, 16:17
I also want to add that in the end the "routine that called ssg_load" ended up spending its time somewhere else, once I added more timing calls, so this optimization is not going to make a huge difference
In GitLab by @carns on Nov 6, 2020, 07:54
Oh, duh, it didn't occur to me that it was re-reading. It looked like a bug because I was thinking of it continuing at the last offset.
Anyhow, this looks good to me now, but I'll defer to Shane for the merge.
In GitLab by @shanedsnyder on Nov 6, 2020, 12:36
New version looks good to me, too. I'll go ahead and merge it in. Thanks @roblatham00!
In GitLab by @shanedsnyder on Nov 6, 2020, 12:37
unmarked as a Work In Progress
In GitLab by @shanedsnyder on Nov 6, 2020, 12:37
merged
In GitLab by @shanedsnyder on Nov 6, 2020, 12:37
mentioned in commit c4c00a44dbb1468621be67a25de1129fdf95aaf4
In GitLab by @roblatham00 on Oct 26, 2020, 13:52
Merges dev-faster-load -> master
in mpi programs i'm seeing a high cost in a routine that calls
ssg_group_id_load
andssg_group_id_serialize
(https://xgitlab.cels.anl.gov/sds/benvolio/blob/master/src/client/bv-client.cc#L592)I thought maybe the stat call was a culprit, so here's a small modification that doesn't call stat. Couple problems here prompting the 'WIP' tag