Validating flowMC Results on Unlabeled Data

Qazalbash commented 1 week ago

Hi there, I'm running some analysis on unlabeled data and I'm wondering how I can make sure the results are accurate. Since there's no ground truth or labeled information, it's tricky to evaluate the performance of the analysis.

What methods can I use to validate the results on unlabeled data?
Are there any specific settings or parameters I need to adjust?
Is there any systematic way to select the optimal hyperparameters? (Because rerunning flowMC on same values of hyperparameters yields deviating results)

zeeshan5885 commented 1 week ago

@kazewong We are using flowmc for our project. It gives us results quickly. However, we need to learn how to choose the values of the following parameters to ensure that we are getting the optimal results. In ML, we can look at training and validation loss plots for decision-making. But in the case of flowMC, Each parameter combination changes the results, so is there a way to get confidence in our results? For instance, plots or any reference values to look at?

N_CHAINS = 20

### Local Sampler settings

STEP_SIZE = 1e-2

N_LAYERS = 5
HIDDEN_SIZE = [512, 512]
NUM_BINS = 10

### Training settings

N_LOOP_TRAINING = 20
N_LOOP_PRODUCTION = 20
N_LOCAL_STEPS = 100
N_GLOBAL_STEPS = 100

NUM_EPOCHS = 50

LEARNING_RATE = 0.001
MOMENTUM = 0.9
BATCH_SIZE = 1 << 12
NUM_SAMPLES = 1 << 13

We would appreciate it if you could explain or point us to resources to read about them before making the decision about parameter selection. Different parameter choices give us different solutions, which causes uncertainty about the results.

kazewong commented 5 days ago

@Qazalbash

In general one cannot validate the result of an MCMC run, which would require you to know the answer to the inference problem to begin with. I am not quite sure what you mean by unlabeled data here. What one can do is to create some synthetic data in which you know the true parameters, and try to see whether the pipeline can recover that ("Injection-recovery test").
See below.
I think the latest version of flowMC is deterministic, as in if you give the same data and the same random key, it should give you the same result. If you observe any difference, can you post a minimal example of where you find flowMC gives you different results?

@zeeshan5885

Without a general sense of the geometry of the target space, it is quite difficult to give detailed advice on how to tune flowMC. Different hyper-parameters should give you different results, in the end, that's why we have the hyperparameters. That said, there are a couple of numbers one can tune:

Increase the number of chains to what your machine can afford without a performance dip. In the case of some single event gravitational wave parameters in 15 dimensional space, it is usually about a couple hundred of chains on a typical cluster class GPU like A100.
Usually the batch size and number of samples should be in the 10,000. Please be mindful about the choice of keywords and make sure the configurations is passed in correctly.
There are some diagnostics you can check to see whether it is the sampler issues or some other issues: 3.1 Check the likelihood values to make sure you don't have nans. On top of that, check your gradients to make sure the gradients of your likelihood also do not have nans. 3.2 Check the acceptance rate for both the local and global sampler. In general local sampler should have an acceptance rate around 0.2-0.8, and global acceptance * number of global steps should be at least 1 per loop. 3.3 Check the number of effective samples.

Qazalbash commented 5 days ago

@kazewong, by unlabeled data I meant the actual GWTC data. We are running our pipelines on synthetic data, and they are working well. We wanted a systematic way/threshold of any quantity with which we can gather the confidence in the analysis that has been done.

Also, I wanted to address one issue that on single GPU we cannot perform analysis with big values. On a GPU with 16GB VRAM we are able to set at most 60 global steps otherwise the program crashes. You mentioned in this comment that flowMC is not utilizing multiple GPU. We bump into this problem most of the time and there is nothing we could do about it. I wanted to know if scaling flowMC to utilize multiple GPU is in your priority.

kazewong commented 5 days ago

Effective sample could be a useful measure https://python.arviz.org/en/stable/examples/plot_ess_evolution.html Another one could be the global acceptance rate as mentioned above.

Note that all the diagnostics I know of can only diagnose when there is an issue, but none of them will give you a sense of how "correct" you are, i.e. how biased the result is.

On the note of multiple GPU, I think if you are reaching your memory limit with the parameters mentioned above, it may be worth looking into how to compress the memory footprint before going to multiple GPUs, for example downsampling the number of samples for each event.. Jax does have distributed computing capability, but I think it is not the highest priority for flowMC now. I also doubt scaling to multiple GPU is going to be an efficient solution to your problem, especially you may have to deal with inter-GPU communication optimization

Qazalbash commented 5 days ago

it may be worth looking into how to compress the memory footprint before going to multiple GPUs

I have thought of the same thing! Do update us if you get to it.

Jax does have distributed computing capability

I was talking about the multi-device parallelism that is explained in a tutorial on SPMD multi-device parallelism with shard_map.

I also doubt scaling to multiple GPUs is going to be an efficient solution to your problem, especially since you may have to deal with inter-GPU communication optimization

I tried to write custom kernels in CUDA and Pallas, and it was overcomplicating a simple problem.

kazewong commented 3 days ago

Custom kernels in CUDA and Pallas probably won't solve the problem. The goal of a custom kernel is usually to reduce round trips to/from the GPU memory, but ultimately if your data is larger than the memory the GPU has, it won't solve your issue.

I am aware of Jax's capability in distributed computing, both within a node and across nodes. The main reason I suggest you compress the memory footprint is using multiple GPUs is not without cost. Communication between GPUs takes time, and it is usually the biggest bottleneck in almost every application. I think if your GPU has 16GB of ram and you are bottlenecked by memory, having, say, 4 GPUs only gets you 4 times as far. Unless you have a large GPU cluster, otherwise investing time in distributed computation here may not solve your problem.

That said, it might be good for flowMC to be able to handle distributed workflow at some point. This should probably come only after the API really stabilized.

Qazalbash commented 1 day ago

An observation I wanted to share. I don't know the reason behind this behaviour. Memory footprint drastically (in GBs) increases when we slightly increase the number of global steps. This doesn't happen with the number of local steps.

zeeshan5885 commented 18 hours ago

In general, the local sampler should have an acceptance rate around 0.2-0.8, and the global acceptance * number of global steps should be at least 1 per loop.

I understood the local acceptance rate should be around 0.2-0.8, but could you please explain the second check?

kazewong / flowMC

Validating flowMC Results on Unlabeled Data #177