I want train cnns on a big dataset via transfer learning using torch in R. Since my dataset is to big to be loaded all at once, I have to load each sample from the SSD in the dataloader. But loading one batch from my SSD takes about 5-10x the time as processing (forward pass, back prop, optimizing) it. Therefore asynchronous parallel data loading would be advisable.
As far as I understand torch, this can be done in the dataloader via the num_workers - parameter. But using that did not decrease the loading time of a batch in the trainingsloop, except from introducing a big overhead before the first batch is gathered (probably there the workers are created). Now I need advise, if this can be done in torch and if I implemented anything wrong.
# I have images of size 299x299 with 13 channels.
# optimizing this loading step yielded no significant improvement.
return(array(readRDS(path), dim=c(13,299,299))*1.0)
target_transform = function(x){a<-c(0.0,1.0)[x];dim(a)<-1;return(a)}
#Here I set num_workers to different numbers, but that did not change the loading time
dl2<-torch::dataloader(dl, batch_size=110L, shuffle = T, num_workers = 15L, pin_memory=T)
#just a random pretrained model for transfer learning
model_torch = torchvision::model_alexnet(pretrained = T)
model_torch$parameters |>
purrr::walk(function(param) param$requires_grad_(FALSE))
# replacing the last layer to my desired classifier
inFeat =model_torch$classifier$'6'$in_features
model_torch$classifier$'6' = nn_linear(inFeat, out_features = 1L)
# I have 13 input channels, therefore I replace the first conv layer with a equivialent one but with 13 input channels
conv1<-torch::nn_conv2d(in_channels=13L, out_channels=model_torch[[1]]$`0`$out_channels,
kernel_size =model_torch[[1]]$`0`$kernel_size ,
stride = model_torch[[1]]$`0`$stride,
padding =model_torch[[1]]$`0`$padding,
dilation = model_torch[[1]]$`0`$dilation, groups = model_torch[[1]]$`0`$groups, bias = TRUE)
model_torch<-model_torch$to(device = "cuda")
opt = optim_adam(params = model_torch$parameters, lr = 0.01)
#trainings loop
for(e in 1:1){
losses = c()
#storing the time which the loop uses for computing and data loading
for(batch in dl2){
#this is the time it takes to load a batch
pred = model_torch(batch[[1]]$to(device="cuda"))
res=batch[[2]]$to(device = "cuda")
loss = nnf_binary_cross_entropy(input=torch_sigmoid(pred),target=res)
losses = c(losses, loss$item())
#this is the time it takes to process a batch
To my understanding the time it takes to load a batch should (after the first few batches) decrease significantly if I use parallel batch loading through num_workers compared to num_workers = 0.
But the printed time stays the same no matter the number of workers used.
I want train cnns on a big dataset via transfer learning using torch in R. Since my dataset is to big to be loaded all at once, I have to load each sample from the SSD in the dataloader. But loading one batch from my SSD takes about 5-10x the time as processing (forward pass, back prop, optimizing) it. Therefore asynchronous parallel data loading would be advisable.
As far as I understand torch, this can be done in the dataloader via the num_workers - parameter. But using that did not decrease the loading time of a batch in the trainingsloop, except from introducing a big overhead before the first batch is gathered (probably there the workers are created). Now I need advise, if this can be done in torch and if I implemented anything wrong.
To my understanding the time it takes to load a batch should (after the first few batches) decrease significantly if I use parallel batch loading through num_workers compared to num_workers = 0.
But the printed time stays the same no matter the number of workers used.
I would be glad if anyone could help me!