[Bug]: DWM crashed while running match_spec on R

zzhui43 commented 7 months ago

Guidelines

[X] I agree to follow this project's Contributing Guidelines.

Project Version

No response

Platform and OS Version

Windows 11 OS 22621.2861

Existing Issues

No response

What happened?

I have been using the package on R for a month or so without encountering any issues. Yesterday, when running the exact same code without changing anything on the system, R returns an error message saying that it cannot handle a vector with a size of 10GB++ (I can't remember the exact size). I wasn't sure what is wrong so I updated my R to the latest version and made sure I install the latest openSpecy package. This time, it resulted in Desktop Window Manager crashes after running for >15 min.

I'm not sure if its a problem with my laptop? I ran some Windows integrity check with these codes but everything seems fine:

sfc /scannow 
Dism /Online /Cleanup-Image /ScanHealth 
Dism /Online /Cleanup-Image /CheckHealth 
Dism /Online /Cleanup-Image /RestoreHealth

I also tried running the codes in clean boot environment but the issue did not resolve.

Steps to reproduce

These are the codes I used to run the matching:

library(OpenSpecy)
#get_lib(type="derivative")
lib <- load_lib(type="derivative")

spec.dir <- list.files(path="C:\\Users\\User\\spectra\\20231229",pattern = "*.JDX",recursive=T,full.names = T)

spec.list <- lapply(spec.dir,read_any)
names(spec.list) <- sub("\\.JDX$", "", basename(spec.dir))
bg<- spec.list[grep("bkg1$", names(spec.list), value = TRUE)]
spec.list <- spec.list[grep("bkg1$", names(spec.list), value = FALSE,invert=T)]

######### spectra preprocessing ######### 
processed.spec <- list()
for (i in 1:length(spec.list)){
  trans <- spec.list[[i]] |> 
    adj_intens(type = "transmission") 
  processed1 <- process_spec(trans,
                             active = TRUE,
                             conform_spec = TRUE,
                             conform_spec_args = list(range = lib$wavenumber, res = 5,type = "interp"),
                             smooth_intens = T,
                             smooth_intens_args = list(polynomial = 3, window = 11,derivative = 1, abs = TRUE),
                             subtr_baseline = FALSE,
                             subtr_baseline_args = list(type = "polynomial",
                                                        degree = 8, raw = FALSE,
                                                        baseline = NULL),
                             make_rel = TRUE)
  processed.spec[[i]] <- processed1
  names(processed.spec)[i] <- names(spec.list)[i]
}

######### identify polymer #########
match.list <- list()
for (i in 1:length(processed.spec)){
  print(paste(Sys.time(),"|","Matching spec",i,"/",length(processed.spec),"..."))
matches <- match_spec(x = processed.spec[[i]], library = lib,
                      add_library_metadata = "sample_name", top_n = 5)
matches <- matches[order(matches$match_val,decreasing=T),]
match.list[[i]] <- matches
print(paste(Sys.time(),"|","Matching spec",i,"/",length(processed.spec),"completed."))
}

I also tried running match_spec without loop but the same issue occurred.

Expected behavior

Previously, a match could be generated in <2 min per spectra, and I was able to match ~800 spectra at one go without running into this problem, but now it just seems to be stuck at the 1st spectrum. I apologise in advanced if this is due to some silly mistakes on my end. Any assistance is greatly appreciated!

Attachments

No response

Screenshots or Videos

No response

Additional Information

No response

wincowgerDEV commented 7 months ago

Sorry we are just now getting to this, not sure how I missed it earlier. Could you share the dataset you're working with? Nothing stands out as being problematic currently but I after I see the data something may stand out.

wincowgerDEV commented 7 months ago

@zzhui43, do your JDX files have multiple spectra in them? I know I deprecated that ability not too long ago because reading the multi jdx files became overly buggy. Might need to revisit it.

zzhui43 commented 7 months ago

Hi Cowger, no worries! Each of my jdx file contains only one spectrum. I did a trial with the attached dataset but still have the same problem. S11A8bkg1.zip

Error message:

[1] "2024-01-24 14:52:19.114426 | Matching spec 1 / 5 ..." Error: cannot allocate vector of size 14.5 Gb In addition: There were 50 or more warnings (use warnings() to see the first 50)

But I managed to do the matching in RMarkdown though. Here's my RMarkdown code:

knitr::opts_chunk$set(warning = FALSE, message = FALSE, results = 'asis')

Loading package and library

library(OpenSpecy)
get_lib("derivative")
lib <- load_lib("derivative")
setwd("C:\\Users\\User\\OneDrive - National University of Singapore\\Grad\\Research\\Heavy metal and microplastics\\Data\\Microplastic\\FTIR\\spectra")

spec.dir <- list.files(path="C:\\Users\\User\\OneDrive - National University of Singapore\\Grad\\Research\\Heavy metal and microplastics\\Data\\Microplastic\\FTIR\\spectra\\20231229\\jdx",pattern = "*.JDX",recursive=T,full.names = T)
spec.list <- lapply(spec.dir,read_any)
names(spec.list) <- sub("\\.JDX$", "", basename(spec.dir))
bg<- spec.list[grep("bkg1$", names(spec.list), value = TRUE)]
spec.list <- spec.list[grep("bkg1$", names(spec.list), value = FALSE,invert=T)]

for (i in 1:length(spec.list)){
spec1 <- spec.list[[i]]
cat(paste("### Spectrum", i,names(spec.list[i]), sep=" "))
cat("\n")

proc <- spec1 |>
  process_spec(conform_spec_args = list(range = lib$wavenumbers), 
               smooth_intens = T, make_rel = T)

compare <- c_spec(list(spec1,proc),range="common")

cat("\n")
cat(paste("#### Raw vs processed spectrum - ", names(spec.list[i]),sep=" "))
cat("\n")
print(plot(compare))
cat("\n")
cat(paste0("*Black = raw; Red = processed*"))
cat("\n\n")
cat(paste0("#### Top Matches:"))
cat("\n")
top_matches <- match_spec(proc, library = lib, na.rm = T, top_n = 5,
                          add_library_metadata = "sample_name",
                          add_object_metadata = "col_id")
top_matches <- top_matches[order(top_matches$match_val,decreasing=T),]

print(knitr::kable(top_matches[, c("SpectrumIdentity","match_val", "SpectrumType","object_id", "library_id")]))
cat("\n")

compare <- c_spec(list(proc,filter_spec(lib, logic = top_matches[[1,"library_id"]])),range="common")

cat(paste("#### Processed spectrum vs top match - ", names(spec.list[i]),sep=" "))
print(plot(compare))
cat("\n\n")
cat(paste("---------------------------------------------------------------------------------------------------------------------------------"))
cat("\n\n")

}

wincowgerDEV commented 7 months ago

@zzhui43

Thanks for sharing this, trying to reproduce now. If I am understanding you right, the function works if in R markdown with the configuration listed but not with vanilla R operations, is that correct?

wincowgerDEV commented 7 months ago

This function is giving me some warning messages

spec.list <- lapply(spec.dir,read_any)

JDX file inconsistency: Minimum of spectrum != MINY: difference = -4.4e-07 (-44 YFACTOR) JDX file inconsistency: Maximum of spectrum != MAXY: difference = 3.2e-07 (32 YFACTOR) JDX file inconsistency: Minimum of spectrum != MINY: difference = -1.6e-08 (-16 YFACTOR) JDX file inconsistency: Maximum of spectrum != MAXY: difference = 1.92e-07 (192 YFACTOR) JDX file inconsistency: Minimum of spectrum != MINY: difference = 1.2e-07 (12 YFACTOR) JDX file inconsistency: Maximum of spectrum != MAXY: difference = -3.6e-07 (-36 YFACTOR) JDX file inconsistency: Minimum of spectrum != MINY: difference = -1.6e-07 (-16 YFACTOR) JDX file inconsistency: Maximum of spectrum != MAXY: difference = 3.6e-07 (36 YFACTOR) JDX file inconsistency: Minimum of spectrum != MINY: difference = 3.2e-07 (32 YFACTOR) JDX file inconsistency: Maximum of spectrum != MAXY: difference = 1.2e-07 (12 YFACTOR) JDX file inconsistency: Minimum of spectrum != MINY: difference = 3.34e-07 (334 YFACTOR) JDX file inconsistency: Maximum of spectrum != MAXY: difference = -1.68e-07 (-168 YFACTOR)

wincowgerDEV commented 7 months ago

Looks like the list items in processed spec is not a properly described Open Specy object.

check_OpenSpecy(processed.spec[[i]])

Error in if (!(cr <- ncol(x$spectra) == nrow(x$metadata))) warning("Number of columns in spectra is not equal to number of rows ", : argument is of length zero In addition: Warning messages: 1: Names of the object components are incorrect 2: Spectra are not of class 'data.table'

wincowgerDEV commented 7 months ago

Seems like an errror is happening to the objects at this point:

trans <- spec.list[[i]] |> 
        adj_intens(type = "transmission") 

check_OpenSpecy(trans)

Error in if (!(cr <- ncol(x$spectra) == nrow(x$metadata))) warning("Number of columns in spectra is not equal to number of rows ", : argument is of length zero In addition: Warning messages: 1: Names of the object components are incorrect 2: Spectra are not of class 'data.table'

wincowgerDEV commented 7 months ago

I think I found the first issue: "transmission" isn't currently a supported type. "transmittance" is the appropriate option. We can add better error handling for this.

trans <- spec.list[[i]] |> adj_intens(type = "transmission")

Should be: trans <- spec.list[[i]] |> adj_intens(type = "transmittance")

wincowgerDEV commented 7 months ago

It looks like with that change, your code runs without warning within a minute on my laptop. I will get the error handling improved.

Fixed this with next update: https://github.com/wincowgerDEV/OpenSpecy-package/pull/161/commits/0a45ab302cfdffa58e07a970c798511da43841cd

zzhui43 commented 7 months ago

Yep, I'm able to run the code after changing to 'transmittance' too. Thank you so much!

wincowgerDEV / OpenSpecy-package