microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.5k stars 363 forks source link

Fix distributed sampler error in [Code-Code] #144

Closed lixinye-nju closed 1 year ago

lixinye-nju commented 1 year ago

Previous version failed to set train_sampler as distributed data sampler and now it's fixed.

lixinye-nju commented 1 year ago

@microsoft-github-policy-service agree

celbree commented 1 year ago

Actually, we don't need DistributedSampler since we make different local_rank load different data. Refer to https://github.com/microsoft/CodeXGLUE/blob/main/Code-Code/CodeCompletion-token/code/dataset.py#L145 input_ids = input_ids[local_rank*length: (local_rank+1)*length].

lixinye-nju commented 1 year ago

Thanks for your reply @celbree. Sorry for the misunderstanding since I run the training process with my own dataset and ignore the detail.