Closed ChenYuhang243 closed 3 months ago
Hi, can you tell me how you organize the data for your training. I'm a bit confuse when its going to
--data_path "path_to_graph_data" \ #10
[1] Fourkioti et.al., CAMIL: Context-Aware Multiple Instance Learning for Cancer Detection and Subtyping in Whole Slide Images, ICLR 2024
Currently running this model on Ubuntu 18.04 on a device with 4 v100 GPUs, each GPU has 16G RAM. Batchsize set as 8, then lower to 4, but always encounter a cuda oom error after 30-40 epochs. Methods tried:
total loss+=float(loss.detach().cpu().numpy())
torch.cuda.empty_cache()
but the error still exists. Desperately searching for a solution, any advice or hint is appreciated!