So, ive tried multiple times to get parallel training to work, but to no avail, many modifications needed to the top_k_graph function, and the GNN_Loss class, which makes things a bit trickier, especially given only batch size 1 works!
Just wondering if this project is still in development or i should give solving these ago myself. Thanks.
Things tried:
DataParallel
DataDistributedParrallel
Reason:
GPU Utilization is always inhibited, my hunch is onloading/offloading from CUDA too often in the patchGCL model class. Specifically the nonzero_graph function, leveraging scipy&numpy is questionable but looking to hear more about it before i make any judgements!
Hope i can help somehow, will provide more info at request.
So, ive tried multiple times to get parallel training to work, but to no avail, many modifications needed to the top_k_graph function, and the GNN_Loss class, which makes things a bit trickier, especially given only batch size 1 works!
Just wondering if this project is still in development or i should give solving these ago myself. Thanks.
Things tried:
DataParallel DataDistributedParrallel
Reason:
GPU Utilization is always inhibited, my hunch is onloading/offloading from CUDA too often in the patchGCL model class. Specifically the nonzero_graph function, leveraging scipy&numpy is questionable but looking to hear more about it before i make any judgements!
Hope i can help somehow, will provide more info at request.