Closed soufianeelalami closed 4 years ago
I'm not sure what the bug is: by requiring the complete predictions for your compute_metrics
function, you are asking for an array of 4,057 by 200 by vocab_size (which for the base CamemBERT model is 30,522 I believe). This does not fit easily in RAM.
Is there another way to compute the metrics (or an estimation) without having to build such a huge vector ?
You haven't shared what metric you are using so I have no idea.
This the function i'm using:
from sklearn.metrics import precision_recall_fscore_support
def compute_metrics(p: EvalPrediction) -> Dict:
#print('raw_predictions: ', p.predictions, '\n')
#print('labels: ', p.label_ids,'\n')
preds = np.argmax(p.predictions, axis=-1)
#print('shape:', preds.shape, '\n')
precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids.flatten(), preds.flatten(), average='weighted', zero_division=0)
return {
'accuracy': (preds == p.label_ids).mean(),
'f1': f1,
'precision': precision,
'recall': recall
}
I guess you could write your custom loop to store the predictions after the argmax together, this won't blow up memory the same way.
Great, thanks a lot for the tip !
I ll mark the issue as closed.
@soufianeelalami Did you come up with a solution for this issue? Our team has run into the same issue with nested_conat
while evaluating on a fairly large dataset.
@gphillips-ema Hello, basically what you need to do is create your trainer class (which inherits from the trainer class) then override the prediction_loop
method to change one particular behavior:
if logits is not None:
#preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
logits_reduced = np.argmax(logits, axis=-1)
preds_host = logits_reduced if preds_host is None else nested_concat(preds_host, logits_reduced, padding_index=-100)
You need to do a np.argmax(logits, axis=-1)
to reduce the dimension of the output logit vector.
If you are using accumulation, then you need to do the same thing in that part of the code (always in the prediction_loop
method).
Please let me know if this solves your problem or if you need any help.
I was facing a related issues with nested_concat
that caused GPU memory errors. Using the Seq2SeqTrainer
instead of the default Trainer solved the issue for me, since does not rely on concatenating the logits over the vocabulary.
Same issue, I got an A5000 gpu for training, but I can't even eval with batch_size=8.
just reduce the per_device_eval_batch_size
arg and set it to a lower value for example: per_device_eval_batch_size=2
that should prevent your issue
Environment info
transformers
version: 3.5.0Who can help
Trainer: @sgugger
Information
Model I am using (Bert, XLNet ...): Camembert
The problem arises when using:
The tasks I am working on is:
To reproduce
I am trying to finetune a Camembert Model for a mlm task. This is the configuration i am using:
Steps to reproduce the behavior:
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 57680691200 bytes. Error code 12 (Cannot allocate memory)
when trying to run thenested_concat
function inside theprediction_loop
.The machine i am using has 120Gb of RAM.
The data contains 20355 sentences with the max number of words in a sentence inferior to 200. The dataset fits easily in the RAM. The subset used for evaluation contains 4057 examples with the same structure as the training dataset.
Expected behavior
It seems that setting
prediction_loss_only=True
avoids the problem as it does not compute evaluation metrics and only loss metric, so it costs much lower RAM to compute. The downside obviously is that you dont get any evaluation metrics.The Trainer should be able to handle the workload as we go further in evaluation steps. Maybe clearing heavy variables in the evaluation process might help avoid blowing up RAM by storing values that are too large.