ravel-lab / VALENCIA

VAginaL community state typE Nearest CentroId clAssifier
MIT License
13 stars 6 forks source link

Interpretation of the VALENCIA output #4

Open ilkaybuysal opened 2 years ago

ilkaybuysal commented 2 years ago

Dear developers,

VALENCIA provides a plot with "similarity to assigned sub-CST" and "subCST" as an output. My understanding is that the x-axis represents the assigned sub CSTs whereas the y axis is the similarity scores for CSTs in the training dataset.

In my output plot, there is a red dot and also yellow lines which have a point in the middle which I assume to be the median similarity score for that subCST. What’s the difference between the red dot and yellow lines?

Moreover, I would assume there would be a black line for each sample (row) in my input, whereas I have 5 samples but only 3 black lines for all the assigned subCSTs. Could you please elaborate more on how to interpret the plot output of VALENCIA?

Thanks in advance and thanks for developing fascinating tools for researchers in the vaginal microbiome field.

image

michaelfrance commented 2 years ago

Hi,

The points and lines represent the average and standard deviation of similarity scores for the assignments using the 13k sample reference dataset. The difference between the red and yellow results from differences in the mean and stdev of similarity scores for samples assigned to CST I-A versus III-A and III-B. The black lines result from the program trying to plot a boxplot representation of one point. It looks like 3 of your 5 samples were assigned to I-A so the boxplot is a little more complete. Here is what the plot looks like if you have more samples:

image

Hope this helps,

Michael