Closed xullllllll closed 4 months ago
Yes — a balance needs to be struck between a data-driven model and the data upon which it is to be trained. I have a poster at AAIC next week on this.
The pySuStaIn tutorial notebook mentions this, so work through it yourself.
Second — the model your train will reflect the data you put in. If you want a disease progression model, I personally recommend omitting the controls z-score data.
Well, thank you very much for your answer. However, in the pySuStaIn tutorial notebook you mentioned, it doesn't say in detail how to set z_vals according to the distribution of biomarkers. Is z_vals= 1,2,3 suitable for all types of data? And, can you explain to me the difference between input data that includes control group and input data that does not include control group?
Z_vals
. There is no prescribed way, but I would recommend checking the coverage of your data (e.g., if a biomarker never gets to a certain z-score, then don't include that score in Z_vals
).Oh I see,I still have a few questions about how to choose the value of Z_max. Should I follow what is said in the pySuStaIn tutorial notebook, 'choosing a value around the 95th percentile of your data', or should I do what is mentioned in ZscoreSustain.py, 'when using z-score thresholds of 1, 2, 3, Z_max would typically be 5'?If these two conflict, what should I choose?Must the value of Z_max be greater than the values in Z_vals?
Dear SuStaIn friends, I have a few questions I'd like to ask. First, in ZscoreSustain, does the z_vals and z_max value correlate with the input data? If so, how to set z_vals and z_max according to the input data? Second, should the input data contain only the z_score of the patient or must contain both the z_score of the patient and the Z_score of the healthy control group? Is there any difference between the two data input methods and who has the better effect? Look forward to your answer, thank you.