Questions about code（tensorflow2）operational efficiency

liyuanfeng747 commented 1 year ago

Dear Authors, Recently I have been studying your papers and code. I would like to extend your work to multi-agent. After some more attempts, I have made some progress in my work, however I have encountered the problem that the gradient of the algorithm is too slow to be updated. In my current scenario, there are a total of 8 independent graph neural networks that need to be updated and trained. Even though I have used gpu for training, it still takes about 30 seconds to perform each round of updates. Since I don't know much about the working mechanism of tensorflow. I would like to ask what is causing this problem and is there any chance to improve it?

paulalmasan commented 1 year ago

Hi @liyuanfeng747 , having 8 GNNs sounds like many parameters to train, so it would explain why the gradient computation takes so much time. Maybe you could try reducing the number of GNNs or making them smaller. Also as a first approach I recommend to run the experiments on very small graphs

liyuanfeng747 commented 1 year ago

dear author! i'm so glad to get your reply! Do you mean i should train in small graphs and reduce the state_dims to its link number? thank you very much!

s1mple @.***

------------------ 原始邮件 ------------------ 发件人: "knowledgedefinednetworking/DRL-GNN" @.>; 发送时间: 2022年11月22日(星期二) 晚上9:39 @.>; @.**@.>; 主题: Re: [knowledgedefinednetworking/DRL-GNN] Questions about code（tensorflow2）operational efficiency (Issue #14)

Hi @liyuanfeng747 , having 8 GNNs sounds like many parameters to train, so it would explain why the gradient computation takes so much time. Maybe you could try reducing the number of GNNs or making them smaller. Also as a first approach I recommend to run the experiments on very small graphs

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

paulalmasan commented 1 year ago

Both things you mention should help. If graphs are very small you can also reduce the parameter 'T' (see https://github.com/knowledgedefinednetworking/DRL-GNN/blob/61723d24afe774d9f023dd98fac7166974a13d54/DQN/train_DQN.py#L64)

liyuanfeng747 commented 1 year ago

thank you very much , i should try these precious advices

knowledgedefinednetworking / DRL-GNN

Questions about code（tensorflow2）operational efficiency #14