Closed hichew22 closed 4 months ago
This may be an issue with how latrend prepares the data for the dtwclust package, or a bug in dtwclust triggered by your data.
Quickest way to identify where the problem lies is by directly using the dtwclust package. See dtwclust::tsclust
Please let me know if this works.
When I use dtwclust::tsclust as so, I get this error:
df_ts <- tsmatrix(df, response = "Y")
dtwclust::tsclust(
series = df_ts,
k = 4L
)
In my dataframe df, most patients have 1 daily "Y" value between days 0-30, but there are a few with 1 daily "Y" value between days 0-21 or days 0-26, for example. Thus, when df_ts is generated, there are NAs for days 22-30 or 27-30, for example.
When I filter the dataset to the individuals who have daily "Y" values between days 0-30, the error resolves. Both dtwclust::tsclust() and my original code above work within the code chunks:
# Using dtwclust
df_ts <- tsmatrix(df, response = "Y")
dtwclust::tsclust(
df_ts,
k = 4L
)
# Using latrend
dtw_method <-
lcMethodDtwclust(response = "Y",
nClusters = 4,
nbRedrawing = 1)
dtw_method
dtw_model <- latrend(dtw_method, data = df)
However, when I try to knit the Rmd, I get the same error:
Could you help me look into this?
I also get the same error if I just use the latrend data as follows:
data(latrendData)
options(latrend.id = "Id", latrend.time = "Time")
dtw_method <-
lcMethodDtwclust(response = "Y",
nClusters = 4,
nbRedrawing = 1)
dtw_method
dtw_model <- latrend(dtw_method, data = latrendData)
tsmatrix
is for repeated measures data of equal length, hence the introduction of NAs for some of the series.
dtwclust can handle series of unequal length if you pass them as a list of time series vectors. In a previous edited post I noticed a warning about lapply(series, as.numeric)
on coercion introducing NAs, which would suggest that the method of splitting the time series into a list of series is creating vectors of mixed types (e.g. numbers and characters).
To test this, you can run:
all(sapply(series, is.numeric))
Since this is an issue related to the dtwclust package, the contributors of that package will be better able to help you.
I wanted to clarify that I edited my data frame to remove any NA values, and the code chunks above work within my Rmd file. However, when I knit the file, I get that long segfault error message when it gets to:
dtw_model <- latrend(dtw_method, data = latrendData)
This occurs even with the latrend data (code chunks work fine when I run them individually, but when I knit the file, the default error occurs when the knit gets to that line of code).
Can you confirm that this is not an issue with latrend but with dtwclust?
If I'm understanding correctly, when knitting, you get the segfault error also when directly using dtwclust::tsclust
? This would suggest it's not an issue with latrend.
In general, I've noticed that knitr can sometimes produce some really unexplainable behavior, things like forgetting R options, which could lead to errors in latrend calls. But that does not seem to be the problem here.
Since the error you posted is triggered by the sys.getenv
default-option call, setting the RCPP_PARALLEL_NUM_THREADS environment variable may prevent the problematic function from being called. Worth a try, although it's likely a symptom of another underlying issue.
Might also want to look in the dtwclust documentation to see if there are ways to disable parallel computation.
Got it, I will try asking the dtwclust developers to see if they've run into this. Thank you!
Hello,
I am trying to use the dtw method as follows. I have a large dataframe consisting of longitudinal Y values (1 value per day, 31 values per individual, and ~700 individuals).
However, when I try to call the last line, I get an error saying that R has encountered a fatal error and the session must be terminated. When I try to knit my Rmd, I encounter a very long error message:
Could you let me know how to fix this?
Thank you!