microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 215 forks source link

Inference gives different results when using multiple gpus (distributed mode) vs just one gpu (not distributed mode) #147

Open ThuongTNguyen opened 6 months ago

ThuongTNguyen commented 6 months ago

Hi, I want to report an issue observed when running inference for a classification task.

Description

When running inference (either do_eval=True or do_predict=True), the results are different whether it's in distributed mode (multiple gpus ) or not (one gpu).

When doing evaluation, data is prepared sequentially in batches using SequentialSampler, BatchSampler, and DistributedBatchSampler https://github.com/microsoft/DeBERTa/blob/4d7fe0bd4fb3c7d4f4005a7cafabde9800372098/DeBERTa/apps/run.py#L172 and then sent to gpus. But once logits are computed, there is a step to gather results across devices - merge_distributed. https://github.com/microsoft/DeBERTa/blob/4d7fe0bd4fb3c7d4f4005a7cafabde9800372098/DeBERTa/apps/run.py#L228 After this step, the order of data instances is no longer similar to the one in the original input file (dev.tsv or test.tsv) in case of distributed modes with multiple gpus, resulting in a different accuracy.

Steps to reproduce

Additional information

My system setup is: