Open ajesujoba opened 11 months ago
Hi sorry for the delay. This happens when Gurobi is unable to find a feasible solution before the time out. In general you probably don't want to be using OSP for more than k=10 coefficients.
See A.12 in the sparse probing paper for guidance on feature selection methods but if you want to sweep larger values of k, max mean difference or iterative thresholding are likely better choices.
@wesg52 , thanks for your response. I would like out MMD and iterative thresholding, kindly confirm if the training configurations for both MMD and iterative thresholding are right.
MMD
python probing_experiment.py \
--experiment_name c1 \
--experiment_type heuristic_sparsity_sweep\
--model pythia-800m \
--feature_dataset programming_lang_id.pyth.512.-1 \
--activation_aggregation max \
--osp_upto_k 200
Iterative thresholding
python probing_experiment.py \
--experiment_name c2 \
--experiment_type heuristic_sparsity_sweep\
--model pythia-800m \
--feature_dataset programming_lang_id.pyth.512.-1 \
--probe_location hook_resid_post \
--osp_upto_k 200
So for these osp_upto_k doesn't do anything so this is unnecessary.
To change the method you need to change the --experiment_type flag.
This essentially keys into this dictionary. For MMD you probably want fast_heuristic_sparsity_sweep
(unless you want to sweep values of k upto all neurons, in which case heuristic_sparsity_sweep is correct) and for iterative thresholding you want telescopic_sparsity_sweep
.
Additionally you probably don't want to be sparse probing on hook_resid_post since their is no privledged basis so no reason to expect the resid to be sparse.
Why can I not run OSP for
--osp_upto_k
values greater than 16? Whenever I set the values to some high numbers such as 200, I get the following error:@wesg52, please do you by chance know what the issue is ?