ram-disk size? - Githubissues

wndywllms commented 7 years ago

This is more of a question than an actual issue: How big does the ram-disk need to be in order to use it as the dir_local(_selfcal)? Is there any way to know this (or an upperlimit) before hand? Or perhaps just a practical limit from what others are using?

AHorneffer commented 7 years ago

Well, on the system we are running here /dev/shm has no fixed size...

But - somewhat unsurprisingly - I noticed that in runs where memory becomes an issue in certain steps (one NDPPP process needing 6.5% of the memory on a system with 40 virtual cores) I have to reduce the ncpu value even further to keep things running so that I don't gain from the ram-disk. The other run (with only 80 and not 320 SBs) used only a couple of GBytes when I checked.

Another issue is: #156

darafferty commented 7 years ago

Generally, I've found that 64 GB total (for us it is spread over 4 nodes, each of which have 16 GB) is enough for the full bandwidth when using compression. However, if the calibrator region is very large (and thus the averaging is less), then 64 GB might not be enough. Perhaps we could look into using more compression (we use the most conservative settings at the moment), but one issue is that WSClean fills the MODEL_DATA of the files on the ram disk without compression, increasing their size significantly (@aroffringa, is there a way to use Dysco compression with WSClean predict?)

aroffringa commented 7 years ago

@darafferty If you make sure a compressed MODEL_DATA column exists before wsclean, wsclean will use the compressed column. This can be e.g. done by adding it with NDPPP:

dppp msin=bla.ms msout=. msout.datacolumn=MODEL_DATA steps=[] msout.storagemanager=dysco

(@tammojan recently pointed out this handy one-liner command for adding a column :) ).

You should be a bit careful with compressing the MODEL_DATA though, because for one it is used within the iterations of imaging, and the small errors might build up a bit (although the model data is recalculated every iteration, so it's not that the errors increase linearly -- but it might make the cleaning a bit less stable), and second, the model_data column has no noise, and behaves therefore a bit different when compressed. So I wouldn't compress to less than 12 or maybe even 16 bits, and I recommend doing some test runs to see whether the image quality doesn't decrease because of it. I've superficially tested wsclean with a compressed MODEL_DATA column though, and that worked accurately.

Are all 4 polarizations still stored in the ms? Would it be possible to (temporarily?) go back to storing only Stokes I to decrease the data volume further?

darafferty commented 7 years ago

OK, thanks: I will do some tests with compressing the MODEL_DATA to see how selfcal is affected.

We do store all 4 polarizations (but only consider Stokes I for calibration and imaging, at least for now), so it should indeed be possible to store only one polarization, at least in theory. I'm not sure how DPPP would handle this, though? (Somewhat related: I've asked @tammojan to see whether we can speed up predict in DPPP by predicting only Stokes I, and he's looking into it)

darafferty commented 7 years ago

Well, I ran some selfcal cycles on a 5 Jy source using a compressed (16 bits) MODEL_DATA column and the images are quite a bit more noisy, so I don't think we can do this, unfortunately

wndywllms commented 7 years ago

Back to the original question, seems our 32GB was fine.

So closing this now since your side issue re compressing the MODEL_DATA is also concluded.

lofar-astron / factor

ram-disk size? #160