Input for Valencia and Diagnostic plots

Hello, I have a couple of questions. I ran Valencia on a non-transformed table and again on a CLR transformed table (using GreenGenes and GreenGenes 2 Taxonomy databases). Here are the output diagnostic plots plots for all: figures Now it seems to me that CST assignment was more somewhat better with untransformed data. However, on the output .csv files, the samples with a similarity score of 0 were all assigned a subCST of 1A. CLR transformed output had no scores of 0 however it seems like the CSTs assignment was not as good as the with the transformed data since the similarity score are very low (please correct me if I am interpreting these plots wrong)

My questions are:

Would you suggest that Valencia be used on transformed or untransformed/normalised ASV tables
I matched the taxa names as closely as I could to the Valencia format, however some taxa are not present e.g. with GG2 taxonomy there is no Gardnerella_vaginalis and there are a lot of names that are just not represented in the Valencia CST centroid file. Do you have a suggestion for a naming scheme that would make greengenes and greengenes2 taxonomy much more similar to the Valencia taxonomy? I also have included examples of the way I renamed taxa for Valencia and I am wondering if it is good enough?

Thank you!

ravel-lab / VALENCIA

Input for Valencia and Diagnostic plots #12