If the code must be changed according to the number of GPUs, can you clarify what kind of changes need to be made to the below? What does this function intend? What happens if this code is run as is on a single GPU?
# PLEASE BE VERY VERY CAREFUL HERE
# This code, although takes num_processes as an argument, it in fact only supports num_processes=2
# Future improvement should support interleave for more than 2 processes
# also, small_bsz = large_bsz//4 is hardcoded, which is only true for our experiments
# because when we construct perturb and paraphrase data_loader, we set batch_size=large_bsz//4 specifically
def interleave_eval_result_dict(eval_result_dict, forget_rate, large_bsz, num_processes=2):
small_bsz = large_bsz//4
for k, v in eval_result_dict.items():
# each v corresponds to one ckpt
for metric, value in v.items():
bsz = small_bsz if 'perturb' in metric or 'paraphrase' in metric else large_bsz
total_len = get_total_len(k, forget_rate)
# split in two
a = value[0:len(value)//2]
b = value[len(value)//2:2*(len(value)//2)]
eval_result_dict[k][metric] = interleave(a, b, bsz)[:total_len]
return eval_result_dict
It'll be useful to support num_processes from 1-4 in an update.
If the code must be changed according to the number of GPUs, can you clarify what kind of changes need to be made to the below? What does this function intend? What happens if this code is run as is on a single GPU?
It'll be useful to support
num_processes
from 1-4 in an update.