ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
112 stars 62 forks source link

subtype_and_stage_individuals_newData performance issue #52

Open 88vikram opened 1 month ago

88vikram commented 1 month ago

I was working on using a pre-trained sustain model to predict subtypes in a new dataset using the function "subtype_and_stage_individuals_newData" and I notied a couple of issues:

  1. The function does not work for a single patient data for an input of size 1 x M, where M is the number of input features. I solved the issue temporarily by replicating the patient data and create an input of size 2 x M for estimating the subtypes. But having it work for single patient data would be useful.
  2. The function's computational time increases non-linearly as the number of patients increases. My test set consisted of N ~ 47,000 patients. An input with size N x M would have taken almost 29 hours to predict (I had to abort after a few hours). Instead, calling the function N times with input of size 2 x M took roughly 7 minutes. I measured the time taken to predict for N = 2 to 3000 and the computational time increases nonlinearly.