[Question] Can we discover subtypes in a training test, and use the discovered subtypes to subtype subjects of a test set?

Hi SuStaIn team!

I am trying to use SuStaIn with a train / test like approach, in which I have two dataset:

the training one, that I want to use as to infer the summary subtypes (that is to say: subtype 1 is this sequence of abnormalities, subtype 2 is this sequence, etc), and I do not really worry about the actual subtyping of the individual subjects in this dataset. This is where I would run the run_sustain_algorithm method, if i'm correct.
the test one, that I would like to use as follows: given the subtypes discovered on the training set, I want to subtype (and maybe stage, even though I'm less interest in this) these new subjects with respect to the subtypes discovered on the training set. Intuitively, if I go back to the following notations from the Young et al. (2018) paper, if (f_c, S_c)_c are the subtypes (and their prevalence) discovered at the training step, given some new X (the test data), I would want to evaluate P(X | S_c) for each c, and infer the best subtype for each of my new subjects from this mixture of subtypes.

So it seems to me that this makes sense from a methodological point of view (but I could be mistaken 😅).

Now I don't seem to find exactly how I would proceed to perform this last step, given the output from the first step. I went back to the notebook from the workshop (that I had followed some time ago) and it looks to me that the presented cross_validate_sustain_model mainly focuses on cross validation metrics, rather than outputting the subtypes corresponding to the "test" subtypes.

I am sorry if this is treated somewhere that I have missed, and don't hesitate if the question is somewhat unclear, I'm happy to rephrase or go more into details 🙂

Cheers, Nemo

ucl-pond / pySuStaIn

[Question] Can we discover subtypes in a training test, and use the discovered subtypes to subtype subjects of a test set? #44