philip-davis / dataspaces

Git Home of the RDI2 DataSpaces Project
BSD 2-Clause "Simplified" License
4 stars 1 forks source link

Issue with large values in dims in dataspaces.conf #5

Closed mfatihaktas closed 4 years ago

mfatihaktas commented 8 years ago

src/ds_gspaces.c/dsg_alloc/init_sspace hangs when relatively large values are entered for dims e.g. ndim = 3 dims = 999,999,999

It took me a while to figure this was an issue and I think for a dataspaces user this would be a nightmare to debug. We may make init_sspace return mistake even if large dims values are not acceptable -- which I doubt but I do not know of course.

tongjin commented 8 years ago

dims = 999,999,999 is not a problem. Our common test will set dims to 1024, 1024, 1024, and it worked well. In other real cases, we even set one of them to 65536. I don't think it is the issue.

mfatihaktas commented 8 years ago

I run it again and apparently it does not hang, init_sspace took longer time to finish though -- actually what I experience is the larger dims values are the longer init_sspace takes to finish e.g. for dims= 99999, 99999, 99999 it takes too much that I just kill it. I had a auto-check in my code and it was complaining if dataspaces does not initialize before the expected time and this delay in init_sspace caused a problem but it seems to work eventually.

So overall, large dims values seem to cause delay in the bootup I guess. If anyone else can confirm this delay as well, I think it is still an issue that must be mentioned in README so a user would not try doing this.

tongjin commented 8 years ago

Large dims will cause delay -- it is true. It is caused by SFC calculation. SFC algorithm needs calculate and linearize the domain. If dims set to larger numbers, it needs more time to calculate it.

Yes, I agree that it will be good to notify users about it.

melrom commented 8 years ago

Okay, the SFC algo needs to build up to the next power of 2 of whatever dims you specify. This is just mathematically how the Hilbert SFC works. So when you put 99999, it's going to use 2^17 which is 131072. So really, your dims will be 131072,131072,131072. Yes, this takes awhile to initialize the global domain. You are linearizing a pretty big structure in this case. There are also certain limiting factors, such as... the fact that you may run out of memory on some smaller systems, and DS will just crash. You can try using Fan's hashing method to see how it compares. But Fan's method was more designed for lopsided domains. The issue should be only on the first iteration though because of the caching, as @qiansun has noted in previous DataSpaces mailing list correspondence. I will forward you a few emails about SFCs that may have been exchanged before you were officially on the list.

I only mention because I am extremely interested in this area. However, Prof has noted that it is not so much research as implementation and right now we do "good enough" for the applications we work with. Although it is something I would like to explore in my free time (is that a thing in graduate school? not even sure... lol), it has just been on the back burner, since my research work is more important. I believe this would be something we would have the full-time developer do once we get someone to respond to the job listing. :)

mfatihaktas commented 8 years ago

Thank you Melissa for forwarding the previous discussion to me. If you agree, I can put a note about this in README for users so they would configure dataspaces.conf to suit their application better -- honestly I did not care while putting dims values and it was frustrating to search for the cause :)

melrom commented 8 years ago

Sure, go ahead. I just wanted to clarify in case I was confusing in my last comment: the building up to the 2^k dimensions is automatic from the HFC library. It won't be faster to use the 131072 (except maybe one log2 operation?). Our HFC library is 3rd party, and it is also relatively old. Hence my interest in seeing if the operation can be accelerated - possibly even threaded. :+1:

Anyway, in general, scientists aren't working with large global domains like that right now (oftentimes, it would be sufficient to build something like 4096x4096x4096 and then just store multiple variables). For example, perhaps one is temperature, one is position, etc. In this case, the smaller global domain is built (which is faster) but multiple variables can be mapped using that global domain.

That being said, I am all for robust and efficient software, especially if we want DS to be relevant in the exascale era! If you are also interested in this topic, let's chat when I get back. We may be able to take a note out of the burst buffer book for what is the fastest way to store this data coming in, but it would require some thought. :laughing:

For now, feel free to check in the warning to our readme. If you do not mind, can you also update the readme Stampede section with the compile info I sent you? I just figured since you will have it checked out. If you are too busy, I will look into that this weekend. Most people view that file on Terminal -- so I would use traditional ASCII-esque dividers or artwork rather than markdown.

melrom commented 8 years ago

PS thank you for using the ticketing system!

mfatihaktas commented 8 years ago

I would definitely be interested in following up on this with you. I am currently working on prefetching for "wide-area dataspaces" for two types of data sharing models: (1) time-series (2) spatial. I tried to do SFC-based prefetching for multi-d data (by searching locality on single-d then converting back to multi-d). Given that SFC takes forever to initialize for large global domains, I do not think prefetching for multi-d data for wide-area case would be a use-case since, as you explained, applications are currently using smaller-d but multiple-chunk data ~= time-series data. I am glad that I did not do much progress in this direction but mostly focused on prefetching for time-series data. Thanks for all this information, I really appreciate. Let's keep discussing whenever we have time about this for the sake of both possible research work we can have out of it and/or for dataspaces' benefit.

I will write the note in README and also will update the compilation part as well. I will keep your suggestion about editing in mind. Thanks.