Open Qazalbash opened 1 week ago
@kazewong We are using flowmc for our project. It gives us results quickly. However, we need to learn how to choose the values of the following parameters to ensure that we are getting the optimal results. In ML, we can look at training and validation loss plots for decision-making. But in the case of flowMC, Each parameter combination changes the results, so is there a way to get confidence in our results? For instance, plots or any reference values to look at?
N_CHAINS = 20
### Local Sampler settings
STEP_SIZE = 1e-2
N_LAYERS = 5
HIDDEN_SIZE = [512, 512]
NUM_BINS = 10
### Training settings
N_LOOP_TRAINING = 20
N_LOOP_PRODUCTION = 20
N_LOCAL_STEPS = 100
N_GLOBAL_STEPS = 100
NUM_EPOCHS = 50
LEARNING_RATE = 0.001
MOMENTUM = 0.9
BATCH_SIZE = 1 << 12
NUM_SAMPLES = 1 << 13
We would appreciate it if you could explain or point us to resources to read about them before making the decision about parameter selection. Different parameter choices give us different solutions, which causes uncertainty about the results.
@Qazalbash
@zeeshan5885
Without a general sense of the geometry of the target space, it is quite difficult to give detailed advice on how to tune flowMC. Different hyper-parameters should give you different results, in the end, that's why we have the hyperparameters. That said, there are a couple of numbers one can tune:
@kazewong, by unlabeled data I meant the actual GWTC data. We are running our pipelines on synthetic data, and they are working well. We wanted a systematic way/threshold of any quantity with which we can gather the confidence in the analysis that has been done.
Also, I wanted to address one issue that on single GPU we cannot perform analysis with big values. On a GPU with 16GB VRAM we are able to set at most 60 global steps otherwise the program crashes. You mentioned in this comment that flowMC is not utilizing multiple GPU. We bump into this problem most of the time and there is nothing we could do about it. I wanted to know if scaling flowMC to utilize multiple GPU is in your priority.
Effective sample could be a useful measure https://python.arviz.org/en/stable/examples/plot_ess_evolution.html Another one could be the global acceptance rate as mentioned above.
Note that all the diagnostics I know of can only diagnose when there is an issue, but none of them will give you a sense of how "correct" you are, i.e. how biased the result is.
On the note of multiple GPU, I think if you are reaching your memory limit with the parameters mentioned above, it may be worth looking into how to compress the memory footprint before going to multiple GPUs, for example downsampling the number of samples for each event.. Jax does have distributed computing capability, but I think it is not the highest priority for flowMC now. I also doubt scaling to multiple GPU is going to be an efficient solution to your problem, especially you may have to deal with inter-GPU communication optimization
it may be worth looking into how to compress the memory footprint before going to multiple GPUs
I have thought of the same thing! Do update us if you get to it.
Jax does have distributed computing capability
I was talking about the multi-device parallelism that is explained in a tutorial on SPMD multi-device parallelism with shard_map
.
I also doubt scaling to multiple GPUs is going to be an efficient solution to your problem, especially since you may have to deal with inter-GPU communication optimization
I tried to write custom kernels in CUDA and Pallas, and it was overcomplicating a simple problem.
Custom kernels in CUDA and Pallas probably won't solve the problem. The goal of a custom kernel is usually to reduce round trips to/from the GPU memory, but ultimately if your data is larger than the memory the GPU has, it won't solve your issue.
I am aware of Jax's capability in distributed computing, both within a node and across nodes. The main reason I suggest you compress the memory footprint is using multiple GPUs is not without cost. Communication between GPUs takes time, and it is usually the biggest bottleneck in almost every application. I think if your GPU has 16GB of ram and you are bottlenecked by memory, having, say, 4 GPUs only gets you 4 times as far. Unless you have a large GPU cluster, otherwise investing time in distributed computation here may not solve your problem.
That said, it might be good for flowMC to be able to handle distributed workflow at some point. This should probably come only after the API really stabilized.
An observation I wanted to share. I don't know the reason behind this behaviour. Memory footprint drastically (in GBs) increases when we slightly increase the number of global steps. This doesn't happen with the number of local steps.
In general, the local sampler should have an acceptance rate around 0.2-0.8, and the global acceptance * number of global steps should be at least 1 per loop.
I understood the local acceptance rate should be around 0.2-0.8, but could you please explain the second check?
Hi there, I'm running some analysis on unlabeled data and I'm wondering how I can make sure the results are accurate. Since there's no ground truth or labeled information, it's tricky to evaluate the performance of the analysis.