yikangshen / Ordered-Neurons

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"
https://arxiv.org/pdf/1810.09536.pdf
BSD 3-Clause "New" or "Revised" License
577 stars 101 forks source link

How to train using main.py using multiple GPUs? #10

Open alvations opened 5 years ago

alvations commented 5 years ago

@yikangshen @shawntan Is there an easy way to train the model to replicate the experiments using main.py using multiple GPUs?

When using model = nn.DataParallel(model) before train(), the initialization goes into the LSTM stack and then the ONLSTM cell to return the weights but it throws an error.

We also tried doing the model = nn.DataParallel(model) after the hidden = model.init_hidden(args.batch_size) and it seems like the LinearDropConnect layer can't access the .weight tensors.

yikangshen commented 5 years ago

We didn't try to train the model with multiple GPUs. Maybe you need to rewrite the code for LinearDropConnect function

BuaaAlban commented 5 years ago

Another question, it seems no speed up using GPU compared with CPU, have you met the same problem? Both take 280-290 s each epoch

Shiweiliuiiiiiii commented 5 years ago

@yikangshen @shawntan Is there an easy way to train the model to replicate the experiments using main.py using multiple GPUs?

When using model = nn.DataParallel(model) before train(), the initialization goes into the LSTM stack and then the ONLSTM cell to return the weights but it throws an error.

We also tried doing the model = nn.DataParallel(model) after the hidden = model.init_hidden(args.batch_size) and it seems like the LinearDropConnect layer can't access the .weight tensors.

Hi, Just want to know have you figured this out? Best