wesg52 / sparse-probing-paper

Sparse probing paper full code.
https://arxiv.org/abs/2305.01610
MIT License
46 stars 10 forks source link

Why are all OSP results not available? #2

Open ajesujoba opened 11 months ago

ajesujoba commented 11 months ago

Why can I not run OSP for --osp_upto_k values greater than 16? Whenever I set the values to some high numbers such as 200, I get the following error:

Traceback (most recent call last):
  File "probe_adapters_exp.py", line 260, in <module>
    run_probe_on_layer(
  File "probe_adapters_exp.py", line 106, in run_probe_on_layer
    result = inner_loop_fn(
  File "/data/users/jalabi/adaptation/sparse-probing-paper/experiments/inner_loops.py", line 178, in optimal_sparse_probing
    model_stats, filtered_support, beta, bias = sparse_classification_oa(
  File "/data/users/jalabi/adaptation/sparse-probing-paper/experiments/probes.py", line 133, in sparse_classification_oa
    support_indices = sorted([i for i in range(len(s)) if s[i].X > 0.5])
  File "/data/users/jalabi/adaptation/sparse-probing-paper/experiments/probes.py", line 133, in <listcomp>
    support_indices = sorted([i for i in range(len(s)) if s[i].X > 0.5])
  File "src/gurobipy/var.pxi", line 125, in gurobipy.Var.__getattr__
  File "src/gurobipy/var.pxi", line 153, in gurobipy.Var.getAttr
  File "src/gurobipy/attrutil.pxi", line 100, in gurobipy.__getattr
AttributeError: Unable to retrieve attribute 'X'

@wesg52, please do you by chance know what the issue is ?

wesg52 commented 11 months ago

Hi sorry for the delay. This happens when Gurobi is unable to find a feasible solution before the time out. In general you probably don't want to be using OSP for more than k=10 coefficients.

See A.12 in the sparse probing paper for guidance on feature selection methods but if you want to sweep larger values of k, max mean difference or iterative thresholding are likely better choices.

ajesujoba commented 11 months ago

@wesg52 , thanks for your response. I would like out MMD and iterative thresholding, kindly confirm if the training configurations for both MMD and iterative thresholding are right.

MMD

python probing_experiment.py \
    --experiment_name c1 \
    --experiment_type heuristic_sparsity_sweep\
    --model pythia-800m \
    --feature_dataset programming_lang_id.pyth.512.-1 \
    --activation_aggregation max \
    --osp_upto_k 200

Iterative thresholding

python probing_experiment.py \
    --experiment_name c2 \
    --experiment_type heuristic_sparsity_sweep\
    --model pythia-800m \
    --feature_dataset programming_lang_id.pyth.512.-1 \
    --probe_location hook_resid_post \
    --osp_upto_k 200
wesg52 commented 11 months ago

So for these osp_upto_k doesn't do anything so this is unnecessary.

To change the method you need to change the --experiment_type flag.

This essentially keys into this dictionary. For MMD you probably want fast_heuristic_sparsity_sweep (unless you want to sweep values of k upto all neurons, in which case heuristic_sparsity_sweep is correct) and for iterative thresholding you want telescopic_sparsity_sweep.

Additionally you probably don't want to be sparse probing on hook_resid_post since their is no privledged basis so no reason to expect the resid to be sparse.