philips-software / latrend

An R package for clustering longitudinal datasets in a standardized way, providing interfaces to various R packages for longitudinal clustering, and facilitating the rapid implementation and evaluation of new methods
https://philips-software.github.io/latrend/
GNU General Public License v2.0
28 stars 5 forks source link

segfault error with dtw method #159

Closed hichew22 closed 4 months ago

hichew22 commented 5 months ago

Hello,

I am trying to use the dtw method as follows. I have a large dataframe consisting of longitudinal Y values (1 value per day, 31 values per individual, and ~700 individuals).

dtw_method <-
  lcMethodDtwclust(response = "Y",
                   nClusters = 4,
                   nbRedrawing = 1)
dtw_method

dtw_model <- latrend(dtw_method, data = df)

However, when I try to call the last line, I get an error saying that R has encountered a fatal error and the session must be terminated. When I try to knit my Rmd, I encounter a very long error message: image

Could you let me know how to fix this?

Thank you!

niekdt commented 4 months ago

This may be an issue with how latrend prepares the data for the dtwclust package, or a bug in dtwclust triggered by your data.

Quickest way to identify where the problem lies is by directly using the dtwclust package. See dtwclust::tsclust Please let me know if this works.

hichew22 commented 4 months ago

When I use dtwclust::tsclust as so, I get this error:

df_ts <- tsmatrix(df, response = "Y")
dtwclust::tsclust(
  series = df_ts,
  k = 4L
)
image

In my dataframe df, most patients have 1 daily "Y" value between days 0-30, but there are a few with 1 daily "Y" value between days 0-21 or days 0-26, for example. Thus, when df_ts is generated, there are NAs for days 22-30 or 27-30, for example.

When I filter the dataset to the individuals who have daily "Y" values between days 0-30, the error resolves. Both dtwclust::tsclust() and my original code above work within the code chunks:

# Using dtwclust
df_ts <- tsmatrix(df, response = "Y")
dtwclust::tsclust(
  df_ts,
  k = 4L
)

# Using latrend
dtw_method <-
  lcMethodDtwclust(response = "Y",
                   nClusters = 4,
                   nbRedrawing = 1)
dtw_method
dtw_model <- latrend(dtw_method, data = df)

However, when I try to knit the Rmd, I get the same error:

image

Could you help me look into this?

hichew22 commented 4 months ago

I also get the same error if I just use the latrend data as follows:

data(latrendData)
options(latrend.id = "Id", latrend.time = "Time")
dtw_method <-
  lcMethodDtwclust(response = "Y",
                   nClusters = 4,
                   nbRedrawing = 1)
dtw_method

dtw_model <- latrend(dtw_method, data = latrendData)
image
niekdt commented 4 months ago

tsmatrix is for repeated measures data of equal length, hence the introduction of NAs for some of the series.

dtwclust can handle series of unequal length if you pass them as a list of time series vectors. In a previous edited post I noticed a warning about lapply(series, as.numeric) on coercion introducing NAs, which would suggest that the method of splitting the time series into a list of series is creating vectors of mixed types (e.g. numbers and characters).

To test this, you can run: all(sapply(series, is.numeric))

Since this is an issue related to the dtwclust package, the contributors of that package will be better able to help you.

hichew22 commented 4 months ago

I wanted to clarify that I edited my data frame to remove any NA values, and the code chunks above work within my Rmd file. However, when I knit the file, I get that long segfault error message when it gets to:

dtw_model <- latrend(dtw_method, data = latrendData)

This occurs even with the latrend data (code chunks work fine when I run them individually, but when I knit the file, the default error occurs when the knit gets to that line of code).

Can you confirm that this is not an issue with latrend but with dtwclust?

niekdt commented 4 months ago

If I'm understanding correctly, when knitting, you get the segfault error also when directly using dtwclust::tsclust? This would suggest it's not an issue with latrend.

In general, I've noticed that knitr can sometimes produce some really unexplainable behavior, things like forgetting R options, which could lead to errors in latrend calls. But that does not seem to be the problem here.

Since the error you posted is triggered by the sys.getenv default-option call, setting the RCPP_PARALLEL_NUM_THREADS environment variable may prevent the problematic function from being called. Worth a try, although it's likely a symptom of another underlying issue. Might also want to look in the dtwclust documentation to see if there are ways to disable parallel computation.

hichew22 commented 4 months ago

Got it, I will try asking the dtwclust developers to see if they've run into this. Thank you!