AUROC Calculation - Githubissues

weijiacheng00 commented 2 months ago

Dear iancovert

I hope this email finds you well.

I have been working with the code you provided on GitHub and had a few questions regarding the AUROC calculation. Specifically, I would like to understand how AUROC is computed by sweeping through a series of different 𝜆 regularization parameters.

Additionally, I am curious to know whether you use cross-validation to select the optimal λ and, if so, how this process is implemented. Lastly, when calculating AUROC, is the Granger Causality (GC) matrix continuous, or is it binarized for this computation?

I would greatly appreciate any clarification you can provide regarding these points.

Thank you very much for your time and assistance!

Best regards,

weijiacheng00 commented 2 months ago

GC_est_flat = GC_est.ravel() GC_flat = GC.ravel()

fpr, tpr, thresholds = roc_curve(GC_flat, GC_est_flat) print(thresholds)

auroc_value = auc(fpr, tpr) print(f'AUROC: {auroc_value:.4f}')

Here is my method for calculating AUROC, where I set threshold=False when computing GC_est. Is this the correct approach?

weijiacheng00 commented 2 months ago

This is the TPR and FPR calculated by scanning a series of values of λ on the Lorenz_96 simulation data, with T=250. Is this method correct?

iancovert commented 2 months ago

Hi, your understanding of our AUROC calculation sounds mostly correct. We trained models with a series of regularization strengths, extracted the GC matrix for each model (each one is a binary matrix), calculated the sensitivity/specificity by comparing the estimated and true GC matrices, plotted these points on a ROC curve, and finally calculated the area under the curve.

If I understand your code properly, it seems like you're probably not using the binarized matrix, and instead calculating the AUROC separately for each estimated non-thresholded GC matrix? I believe some papers do this, so it's not without precedent, and it's certainly faster to evaluate this way. However, the results are sensitive to the choice of regularization strength, which isn't ideal. Also, we didn't trust that we could reliably rank the non-sparsified features by their first-layer norms, so we preferred to simply binarize these values and re-train the model at different regularization strengths.

As for the question about selecting $\lambda$ via cross-validation: if you wanted to pick a single best regularization strength, I believe this is what you must do. Unfortunately, this model selection criterion could lead to overestimating the optimal $\lambda$ because having extra inputs will likely not hurt performance. We avoided this issue by not selecting a single $\lambda$ value, and instead tracing out the sensitivity/specificity for a range of values.

weijiacheng00 commented 2 months ago

Thank you, this is very helpful to me

weijiacheng00 commented 2 months ago

I want to know all the specific parameters of each experiment. Can you tell me

weijiacheng00 commented 2 months ago

Do we need to change the context of each CLSTM experiment? Do we need to change the burn-in parameter for Lorenz data with different T lengths

weijiacheng00 commented 2 months ago

if delta_t=0.1 the context is 10，and delta_t=0.05，the context is 20？

iancovert commented 2 months ago

Unfortunately I don't have the exact $\lambda$ values used for each experiment, I've changed institutions and no longer have access to the original source code.

If you have another method you'd like to compare with cMLP/cLSTM, I would recommend running both on the exact same dataset with the same approach to AUROC calculation. My method to find a good $\lambda$ range was to manually search for $\lambda$ values that result in full sparsity (0% variable usage) and no sparsity (100% variable usage), and then use a range of values in between. I did this separately for each dataset, which unfortunately takes some time.

For data generation under the Lorenz model, I believe we used the default delta_t, sd and burn_in values in our code. The current burn-in value should be large enough even if you make the sequence longer: once you've burned in, you can generate arbitrarily more data points and they'll be from the correct generating process. For delta_t, this shouldn't really affect the context, assuming you mean how many past data points affect the next one: my understanding is that because this is an ODE, the next point depends only on the previous point, although earlier ones may be useful to infer the actual previous point and its partial derivatives due to noise in the observations.

weijiacheng00 commented 2 months ago

ths i try again.

iancovert / Neural-GC

AUROC Calculation #13