rformassspectrometry / MetaboCoreUtils

Core utilities for metabolomics.
https://rformassspectrometry.github.io/MetaboCoreUtils/index.html
8 stars 6 forks source link

substractElements handling of NA values #61

Closed ThorstenGravert closed 1 year ago

ThorstenGravert commented 1 year ago

Hi all, I'm currently exploring the MetaboCoreUtils and it's super useful, well done. For some lipidomics processing, I tried to use subtractElements to calculate the tail composition of annotated lipids. Simple enough, the idea is to take the sum formula and subtract the headgroup composition (if known).

However, subtractElements seems to have issues with NA values. I use a dplyr pipe to calculate the tail group for all rows (lipid entries) that have a head group composition. My natural approach would be to use a case_when statement, with the condition to apply subtractElements only to items, not NA (!is.na). But I get an error, related to the countElements function, used in subtractElements. Interestingly, if I filter my data frame first, to remove all rows containing NAs in the headgroup composition field, the functions work fine and I get the desired results.

I made an example below with the code for my pipe, one working, the other failing and I pasted the error output.

Maybe one of you knows what causes the issue and if there is a better solution than filtering and then joining.

df <- structure(list(Index = c(153L, 158L, 160L, 166L, 167L, 177L, 
180L), Name = c("1beta-(2-fluoroethyl)-25-hydroxy-26,27-dimethyl-24a-homo-22-oxa-3-epivitamin D3 / 1beta-(2-fluoroethyl)-25-hydroxy-26,27-dimethyl-24a-homo-22-oxa-3-epicholecalciferol", 
"1-Hydroxy-3,4,7,8,1',2',11',12'-octahydrospherioidene", "1-O-(2R-methoxy-4Z-heptadecenyl)-sn-glycerol", 
"1-O-behenoyl-Cer(d18:1/16:0)", "1-O-behenoyl-Cer(d18:1/18:0)", 
"1-O-pentacosanoyl-Cer(d18:1/16:0)", "1-O-stearoyl-Cer(d18:1/16:0)"
), Formula.x = c("C31H51FO3", "C41H68O2", "C21H42O4", "C56H109NO4", 
"C58H113NO4", "C59H115NO4", "C52H101NO4"), `Head Group composition` = c(NA, 
NA, "C4H8O4", "C4H9NO3", "C4H9NO3", "C4H9NO3", "C4H9NO3")), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))  

The following works fine, but I will need to join the output back together with the initial dataframe.

df<- df %>%
  filter(!is.na(`Head Group composition`)) %>%
  mutate(
    "TailFormula" = case_when(
    is.na(`Head Group composition`) ~ NA, 
    !is.na(`Head Group composition`) ~ subtractElements(Formula.x, `Head Group composition`))
    )

The following should work fine (since I'm excluding NA values in the case_when statement).

df<- df %>%  mutate(
    "TailFormula" = case_when(
    is.na(`Head Group composition`) ~ NA, 
    !is.na(`Head Group composition`) ~ subtractElements(Formula.x, `Head Group composition`))
    )

But it returns following error:

Error in .fun(): ℹ In argument: TailFormula = case_when(...). Caused by error in case_when(): ! Failed to evaluate the right-hand side of formula 2. Caused by error in substring(): ! invalid substring arguments Backtrace:

  1. ... %>% ...
    1. MetaboCoreUtils::subtractElements(Formula.x, Head Group composition)
    2. MetaboCoreUtils::countElements(y)
    3. base::mapply(...)
    4. MetaboCoreUtils (local) <fn>(xx = dots[[1L]][[1L]], rr = dots[[2L]][[1L]])
    5. base::substring(xx, start, end)
sgibb commented 1 year ago

I am not familiar with dplyr but it seems that case_when (similar to ifelse) first executes all statements on the right and than subset the results based on the logical expression on the left. That means subtractElements is run for the whole data.frame. Nevertheless it would be good to handle NA in subtractElements.

jorainer commented 1 year ago

agree @sgibb that handling NA would be good in subtractElements - could you maybe add an issue for that?