Efficiency of calculating optimize_loess_span for new_expression_data

There is a chunk of code (below) that appears in many time. It appears to be (one of) the longer steps to reproducing the data. Is there any reason why we could not subset out features (1000 at a time?) and parallelize? I am considering doing for a dataset I have with whole transcriptome data (upwards of 20k features).

new_expression_data <-
  vector(mode = "list", length = nrow(variable_info))

for (i in seq_along(variable_info$variable_id)) {
  temp_variable_id <- variable_info$variable_id[i]
  cat(i, " ")
  temp_data <-
    data.frame(value = as.numeric(expression_data[temp_variable_id,]),
               sample_info)

  optimize_span <-
    optimize_loess_span(
      x = temp_data$adjusted_age,
      y = temp_data$value,
      span_range = c(0.4, 0.5, 0.6)
    )

  span <-
    optimize_span[[1]]$span[which.min(optimize_span[[1]]$rmse)]

  value <- temp_data$value
  adjusted_age <- temp_data$adjusted_age

  ls_reg <-
    loess(value ~ adjusted_age,
          span = span)

  prediction_value =
    predict(ls_reg,
            newdata = data.frame(adjusted_age = seq(30, 75, by = 0.5)))
  new_expression_data[[i]] <- as.numeric(prediction_value)
}

jaspershen-lab / ipop_aging

Efficiency of calculating optimize_loess_span for new_expression_data #5