jhchang / DFDC

GNU General Public License v3.0
0 stars 1 forks source link

Parameters printed out for summary of pruned model are the same as base model #23

Open jhchang opened 3 years ago

jhchang commented 3 years ago

image

jhchang commented 3 years ago

git hub won't let me attach the csv file

jhchang commented 3 years ago

output when using

fname = str(num_params) + '_' + str(prune_p) + '.pt'
torch.save(net, fname)
Running base model
file size: 71018195 bytes
==================================================
# of Params to be pruned: 420
% param pruned: 0.8
file size: 281952740 bytes
==================================================
# of Params to be pruned: 420
% param pruned: 0.4
file size: 281952740 bytes
==================================================
# of Params to be pruned: 420
% param pruned: 0.2
file size: 281952740 bytes
==================================================
# of Params to be pruned: 220
% param pruned: 0.8
file size: 211151252 bytes
==================================================
# of Params to be pruned: 220
% param pruned: 0.4
file size: 183999316 bytes
==================================================
# of Params to be pruned: 220
% param pruned: 0.2
file size: 179738452 bytes
==================================================
# of Params to be pruned: 10
% param pruned: 0.8
file size: 77163152 bytes
==================================================
# of Params to be pruned: 10
% param pruned: 0.4
file size: 72878224 bytes
==================================================
# of Params to be pruned: 10
% param pruned: 0.2
file size: 77621072 bytes
==================================================
jhchang commented 3 years ago

while doing

fname = str(num_params) + '_' + str(prune_p) + '.pkl'
with open(fname, 'wb') as file:  
    pickle.dump(net, file)

I get this msg from colab: Your session crashed after using all available RAM.

jhchang commented 3 years ago

saving using

torch.save(net.state_dict(), fname)
Running base model
file size: 70978643 bytes
==================================================
# of Params to be pruned: 420
% param pruned: 0.8
file size: 141331543 bytes
==================================================
# of Params to be pruned: 420
% param pruned: 0.4
file size: 141331543 bytes
==================================================
# of Params to be pruned: 420
% param pruned: 0.2
file size: 141331543 bytes
==================================================
# of Params to be pruned: 220
% param pruned: 0.8
file size: 117800015 bytes
==================================================
# of Params to be pruned: 220
% param pruned: 0.4
file size: 101114703 bytes
==================================================
# of Params to be pruned: 220
% param pruned: 0.2
file size: 101535695 bytes
==================================================
# of Params to be pruned: 10
% param pruned: 0.8
file size: 73730157 bytes
==================================================
# of Params to be pruned: 10
% param pruned: 0.4
file size: 71708269 bytes
==================================================
# of Params to be pruned: 10
% param pruned: 0.2
file size: 76365741 bytes
==================================================
jhchang commented 3 years ago

I found this post that claims the pruning methods I am using doesn't change the model size:

https://stackoverflow.com/questions/65827031/pytorch-global-pruning-is-not-reducing-the-size-of-the-model

nickvazz commented 3 years ago
from pytorch_modelsize import SizeEstimator
se = SizeEstimation(model, input_size=(1,1,32,32))
estimate = se.estimate_size()
# Returns
# (Size in Megabytes, Total Bits)
print(estimate) # (0.5694580078125, 4776960)
nickvazz commented 3 years ago

https://github.com/VainF/Torch-Pruning

jhchang commented 3 years ago

(Linear(in_features=1792, out_features=1, bias=True), 'weight')

jhchang commented 3 years ago

array([[-0.01620483, 0.00227165, -0.00603867, ..., 0.01143646, -0.02052307, 0.00884247]], dtype=float32)

nickvazz commented 3 years ago
np.isclose(pre_prune, post_prune).sum()
nickvazz commented 3 years ago

https://numpy.org/doc/stable/reference/generated/numpy.isclose.html

nickvazz commented 3 years ago

before

preprune = np.ravel(prun_list[0][0].weight.cpu().detach().numpy())

after

preprune = np.ravel([p[0].weight.cpu().detach().numpy() for p in prun_list if p[1] == 'weights' ])
nickvazz commented 3 years ago
postprune = np.ravel([p[0].weight.cpu().detach().numpy() if p[1] == 'weights' else p[0].bias.cpu().detach().numpy() for p in prun_list  ])
nickvazz commented 3 years ago

image

nickvazz commented 3 years ago

absolute(a - b) <= (atol + rtol * absolute(b))

nickvazz commented 3 years ago
try:
    stuff
except BaseException as e:
    print(e)
jhchang commented 3 years ago
TypeError                                 Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-135-3e55b56a5813> in <module>()
     79                 pass
     80                 # print(e)
---> 81         preprune = np.ravel(np.array(preprune)).astype(float)
     82 
     83         prune.global_unstructured(

ValueError: setting an array element with a sequence.
jhchang commented 3 years ago

Screenshot (365)

nickvazz commented 3 years ago

current_results = [num_params, prune_p, accuracy, total_model_run_time] + [prune_p] * len(params_pruned)
columns_for_run = columns + params_pruned

with mlflow.start_run():
    mlflow.log_params({k:v for k, v in zip(current_results, columns_for_run)})
nickvazz commented 3 years ago
    result = pd.Series({k:v for k,v in zip(column_names + current_run_module_names_list, current_results)})

    results.append(result)

df = pd.DataFrame(results)
df.to_csv(f'/content/drive/MyDrive/cs274/results/{timestamp}.csv')
nickvazz commented 3 years ago
df = pd.read_csv("file.csv")
nickvazz commented 3 years ago
from glob import glob

csvs = glob("/path/*.csv")

df = pd.concat([pd.read_csv(csv) for csv in csvs])
nickvazz commented 3 years ago
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

df.plot(x='col_A', y='col_B', kind='scatter')
plt.show()

df['col_A'].plot(kind='hist', bins=20)
plt.show()

fig, ax = plt.subplots(1,2, figsize=(10,10), sharey=True)

df['col_B'].plot(kind='hist', bins=20, ax=ax[0])
df.plot(x='col_A', y='col_B', kind='scatter', ax=ax[1])
plt.show()