kevinblighe / RegParallel

Standard regression functions in R enabled for parallel processing over large data-frames.
37 stars 12 forks source link

Error in { : task 1 failed - "NA/NaN/Inf in foreign function call (arg 6)" #4

Closed hchintalapudi closed 2 years ago

hchintalapudi commented 3 years ago

Hi, I can't seem to understand what this error means, I made sure I removed the NAs in my data.

> head(skcm_cox_input[,c(1:7)])
                 survival.time censor age_at_diagnosis ajcc_pathologic_tumor_stage HML3_1p36.33 MER4B_1p36.33 HARLEQUIN_1p36.33
TCGA-BF-A1PU-01A           387      0               46                    Stage II    0.8408482      6.994536          6.336147
TCGA-BF-A1PV-01A            14      0               74                    Stage II    0.8408482      7.319005          6.061691
TCGA-BF-A1PX-01A           282      1               56                   Stage III    0.8408482      6.935199          8.010380
TCGA-BF-A1PZ-01A            12      0               71                    Stage II    0.8408482      7.094457          7.746144
TCGA-BF-A1Q0-01A            17      0               80                    Stage II    1.8371404      7.429693          7.048911
TCGA-BF-A3DJ-01A           464      0               36                   Stage III    2.6068050      7.447827          9.208338
skcm_cox_input$ajcc_pathologic_tumor_stage<- as.factor(skcm_cox_input$ajcc_pathologic_tumor_stage)
skcm_cox_input$censor<- as.numeric(skcm_cox_input$censor)
skcm_cox_input$survival.time<- as.numeric(skcm_cox_input$survival.time)
skcm_cox_input$age_at_diagnosis<- as.numeric(skcm_cox_input$age_at_diagnosis)
skcm_cox_res<-RegParallel(
    data = skcm_cox_input,
    formula = 'Surv(survival.time, censor) ~ [*] + age_at_diagnosis + ajcc_pathologic_tumor_stage',
    FUN = function(formula, data)
      coxph(formula = formula,
        data = data,
        ties = 'breslow',
        singular.ok = TRUE),
    FUNtype = 'coxph',
    variables = colnames(skcm_cox_input)[5:ncol(skcm_cox_input)],
    blocksize = 4000,
    cores = 2,
    nestedParallel = FALSE,
    conflevel = 95,
    excludeTerms = c("age_at_diagnosis", "ajcc_pathologic_tumor_stage"),
    excludeIntercept = TRUE,
    p.adjust = 'fdr')
##############################
#RegParallel
##############################

System is:
-- Darwin
Blocksize:
-- 4000
Cores / Threads:
-- 2
Terms included in model:
-- survival.time
-- censor
-- age_at_diagnosis
-- ajcc_pathologic_tumor_stage
First 5 formulae:
-- Surv(survival.time, censor) ~ HML3_1p36.33 + age_at_diagnosis + ajcc_pathologic_tumor_stage
-- Surv(survival.time, censor) ~ MER4B_1p36.33 + age_at_diagnosis + ajcc_pathologic_tumor_stage
-- Surv(survival.time, censor) ~ HARLEQUIN_1p36.33 + age_at_diagnosis + ajcc_pathologic_tumor_stage
-- Surv(survival.time, censor) ~ HERVIP10F_1p36.33 + age_at_diagnosis + ajcc_pathologic_tumor_stage
-- Surv(survival.time, censor) ~ HML4_1p36.32 + age_at_diagnosis + ajcc_pathologic_tumor_stage
Error in { : 
  task 1 failed - "NA/NaN/Inf in foreign function call (arg 6)"

Any tips appreciated, thanks!

kevinblighe commented 3 years ago

Can you check for all types of missing and other values, including -Inf, NA, n/a, and even empty fields "". It looks like you are using TCGA clinical data, which can be very sparse.

hchintalapudi commented 3 years ago

Hi, Thanks for this suggestion. I checked and I did not find any of the values you mentioned.

allmisscols <- sapply(skcm_cox_input, function(x) all(is.na(x) | x == '' | x=='n/a' | x== -Inf))
colswithallmiss <-names(allmisscols[allmisscols>0])
cat( colswithallmiss,sep="\n")

gives me nothing.

I wonder what went wrong.

kevinblighe commented 3 years ago

Any success? If you want, please provide a minimal reproducible example so that I can re-produce and debug this error. Otherwise I will close this issue in a few days. Thanks!

hchintalapudi commented 3 years ago

Hi Kevin, I'll see if I can replicate the same issue with another dataset. For now, I'm attaching the data here and it would be great if you could run and see if you get the same error or if you see anything worrisome with my data/formatting. Here is the piece of code I was trying to run:

skcm_cox_res<-RegParallel(
    data = skcm_cox_input,
    formula = 'Surv(survival.time, censor) ~ [*] + age_at_diagnosis + ajcc_pathologic_tumor_stage',
    FUN = function(formula, data)
      coxph(formula = formula,
        data = data,
        ties = 'breslow',
        singular.ok = TRUE),
    FUNtype = 'coxph',
    variables = colnames(skcm_cox_input)[5:ncol(skcm_cox_input)],
    blocksize = 4000,
    cores = 2,
    nestedParallel = FALSE,
    conflevel = 95,
    excludeTerms = c("age_at_diagnosis", "ajcc_pathologic_tumor_stage"),
    excludeIntercept = TRUE,
    p.adjust = 'fdr')

skcm_cox-input_tmp.csv

Thanks for your time.

kevinblighe commented 2 years ago

Hi, can you provide any update here? Thank you.

kevinblighe commented 2 years ago

Please re-open if the issue persists.