Open Ulfgard opened 8 years ago
log_dim is an undocumented option. is this equivalent to "all" or is it a type for "low_dim"? is it afe for the workshop to set log_decision_variables to low_dim?
This is an unfortunate typo. It should say (and it will, in a minute) low_dim
. To answer your question: yes, you can set it to low_dim
for the workshop, which should help with the huge amounts of data you are getting now.
I doubt that the decision variables of low dim contribute much to the storage cost.
On Fri, Mar 25, 2016 at 10:07 PM, ttusar notifications@github.com wrote:
log_dim is an undocumented option. is this equivalent to "all" or is it a type for "low_dim"? is it afe for the workshop to set log_decision_variables to low_dim?
This is an unfortunate typo. It should say (and it will, in a minute) low_dim. To answer your question: yes, you can set it to low_dim for the workshop, which should help with the huge amounts of data you are getting now.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/numbbo/coco/issues/905#issuecomment-201502933
The low_dim
option means that the decision variables will be archived only for low dimensions. In @Ulfgard's example this would save him cca. 3GB of space for the first function only.
I guess it is 3GB with x-values and nondominated points, without x-values it might be 1GB, i.e., 2GB saving from the original 7GB, i.e., some 30% only, it does not change much the storage cost.
It's 4 GB now. That is still less than 50% improvement, but better than nothing...
for which graphs are the fronts relevant? i used log_nondominated: "none" and still got the same plots by the post-processing. I think this is only relevant for plotting the fronts but this is not included in the post-processing (At least i did not find a switch). Also for which plots should the x-values be relevant?
The fronts are not needed for the graphs, but for providing complete information about the performance of the algorithm that can/will be used in the future to compute different performance indicators or for recomputing this one with a different reference set (note that because we are recording performance only when targets are hit, we cannot just simply shift the indicator values from the dat
and tdat
files in this last case).
But you make a good point here - we should advertise this more. Namely, you can (and probably should) turn the archiving off (using log_nondominated: none
) when you do some preliminary experiments and tune your algorithm, but you need to use archiving (log_nondominated: low_dim
) when you are doing the final experiments to be submitted for the workshop.
@nikohansen suggested to check the remaining disk space and stop logging when less than 10GB (or something similar) are left. Another option could be to start thinning out the data written after a certain amount of time (in #funevals) which might be (much) easier than option 1 if we want to be truly OS independent.
i worked on my algorithm a bit, so that now more nondominated points are generated.
these are now the memory requirements for problem 1 with low_dim (verified, 10 and 20 dim only store 3 columns)
796M bbob-biobj_f01_d02_nondom_all.adat 1.4G bbob-biobj_f01_d03_nondom_all.adat 2.8G bbob-biobj_f01_d05_nondom_all.adat 2.2G bbob-biobj_f01_d10_nondom_all.adat 4.2G bbob-biobj_f01_d20_nondom_all.ada
This makes it hard for me to submit as I do not have so much webspace available (let alone space on my computing machine)
fYi: number of stores points
9855005 bbob-biobj_f01_d02_nondom_all.adat 13912673 bbob-biobj_f01_d03_nondom_all.adat 22304334 bbob-biobj_f01_d05_nondom_all.adat 43018678 bbob-biobj_f01_d10_nondom_all.adat 82640702 bbob-biobj_f01_d20_nondom_all.adat
thinning will btw make everything worse, because thinning will change the graphs. I think the proven convergence rate wrt hypervolume is O(1/numberOfPointsInFront) and thus a thinning of factor 10 would be equivalent to running the algorithm by a factor 10 shorter(assuming that all relevant parts of the front are covered).
A solution would be to change the logging of the indicator from "when target is hit" to "after specific iterations" (or do that additionally). In this case one could apply an exponential grid, e.g. log at iterations 1,2,...,10,20,...,100,200,...,1000, etc
this way one could not figure out exactly, when a target is hit, but the error would be small on logscale and thus probably not visible in the graph (given a finer grid than i gave here).
targets are not an issue, even if you define 100k of them uniformly or/and log uniformly, it takes relatively no space. If I understood correctly, the main motivation in storing non-dom points is to use alternative/non-hypervolume metrics afterwards. As I already said, I am not sure non-dom points will be enough for alternative metrics as they also may need some dominated points to be computed properly.
BTW, we are also going to submit the results with 1M times D evaluations and will also have to send a few hundred G.
Would it be possible to zip the files into archives during the run - for instance after each dimension that has been processed?
the compression rate was pretty low to me, 2 or 3, better than nothing but I believe we will generate 0.5-1TB of the raw data
On Sun, Apr 3, 2016 at 11:25 PM, tobiaswagner notifications@github.com wrote:
Would it be possible to zip the files into archives during the run - for instance after each dimension that has been processed?
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/numbbo/coco/issues/905#issuecomment-205059052
We are considering thinning the data using the following method:
However, right now you already have all these runs done, so the idea is to write a script that you could run on your existing archive files and thin them "post hoc". In order to be able to do this efficiently, we are asking for your help. Could you, @Ulfgard and @loshchil, each provide a small set of large "representative" archives (we would need at least one archive per dimension) so that we test the thinning efficiency?
We would of course also need to eventually implement thinning in the logger.
i would be okay with rerunning the algorithm, i am still in the tuning phase. But I can send you a tar archive of f01, its 4.4GB in size. do you have a space where i can send it to?
Can you upload it using https://mega.nz or a similar site?
[link deleted]
I have the link, thanks!
You may use the archives I sent you some time ago, I don't have any new ones.
On Mon, Apr 4, 2016 at 3:24 PM, ttusar notifications@github.com wrote:
I have the link, thanks!
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/numbbo/coco/issues/905#issuecomment-205295121
[link deleted] eh ;-)
This link http://stackoverflow.com/questions/238603/how-can-i-get-a-files-size-in-c should help, to at least limit the size of the written archive file.
Conclusion of Feb 2: it seems easier to check the size of the already written files and stop writing the archive if this exceeds a certain given maxsize.
I tried to run the benchmark for dimensions 2,5,10,20 with budget 10^6*dim, but the program got killed by the operating system after exdata/algorithm/ grew to about 340GB and filled up all available disk space. Almost all of this is located in the archive subfolder
any help?
Looking at the archive, the front files are the issues, e.g. du -h:
so 1 function needs in this setup roughly 7GB so ~385GB in total
//edit The problem i am having is with this option for the observer:
log_dim is an undocumented option. is this equivalent to "all" or is it a type for "low_dim"? is it afe for the workshop to set log_decision_variables to low_dim?