I am trying to use SuStaIn with a train / test like approach, in which I have two dataset:
the training one, that I want to use as to infer the summary subtypes (that is to say: subtype 1 is this sequence of abnormalities, subtype 2 is this sequence, etc), and I do not really worry about the actual subtyping of the individual subjects in this dataset. This is where I would run the run_sustain_algorithm method, if i'm correct.
the test one, that I would like to use as follows: given the subtypes discovered on the training set, I want to subtype (and maybe stage, even though I'm less interest in this) these new subjects with respect to the subtypes discovered on the training set. Intuitively, if I go back to the following notations from the Young et al. (2018) paper, if (f_c, S_c)_c are the subtypes (and their prevalence) discovered at the training step, given some new X (the test data), I would want to evaluate P(X | S_c) for each c, and infer the best subtype for each of my new subjects from this mixture of subtypes.
So it seems to me that this makes sense from a methodological point of view (but I could be mistaken 😅).
Now I don't seem to find exactly how I would proceed to perform this last step, given the output from the first step. I went back to the notebook from the workshop (that I had followed some time ago) and it looks to me that the presented cross_validate_sustain_model mainly focuses on cross validation metrics, rather than outputting the subtypes corresponding to the "test" subtypes.
I am sorry if this is treated somewhere that I have missed, and don't hesitate if the question is somewhat unclear, I'm happy to rephrase or go more into details 🙂
Hi SuStaIn team!
I am trying to use SuStaIn with a train / test like approach, in which I have two dataset:
run_sustain_algorithm
method, if i'm correct.So it seems to me that this makes sense from a methodological point of view (but I could be mistaken 😅).
Now I don't seem to find exactly how I would proceed to perform this last step, given the output from the first step. I went back to the notebook from the workshop (that I had followed some time ago) and it looks to me that the presented
cross_validate_sustain_model
mainly focuses on cross validation metrics, rather than outputting the subtypes corresponding to the "test" subtypes.I am sorry if this is treated somewhere that I have missed, and don't hesitate if the question is somewhat unclear, I'm happy to rephrase or go more into details 🙂
Cheers, Nemo