khanrc / pt.darts

PyTorch Implementation of DARTS: Differentiable Architecture Search
MIT License
439 stars 108 forks source link

Why using broadcast for edge weights? #12

Closed zzzxxxttt closed 5 years ago

zzzxxxttt commented 5 years ago

Hi~ Thanks for providing this great implementation! I'm quite interested in the multi-gpu part which uses replicate for network weights and broadcast for the edge weights. I'm not familiar with parallel programming in pytorch, and I'm curious about the difference between broadcast and replicate, could you explain why we should use broadcast for edge weights?

khanrc commented 5 years ago

replicate is implemented using broadcast. You can see that here: https://github.com/pytorch/pytorch/blob/v1.1.0/torch/nn/parallel/replicate.py#L97

replicate is designed for network (nn.Module) replication. We just want to broadcast some tensors.

zzzxxxttt commented 5 years ago

Get it, thank you very much!