Closed cwarden45 closed 2 years ago
Hi Charles, Thanks for your questions! First of all, I would like to emphasize that we generated synthetic data from real data, and the only thing we did to modify the real data was to permute the non-DEGs. Each synthetic dataset's selected true DEGs (true positives) are randomly sampled from the true DEG set, which are the genes identified as DEGs by all six methods at a very small FDR threshold (0.0001%). For results in Fig. S19, we varied the percentages of selected true DEGs: 10%, 30%, and 90% of all true DEGs (1%, 3%, 9% of all genes).
For your first question, we did not modify the real data for the selected true DEGs, and the expressions and fold changes for those genes are the same as real data. It means that within each synthetic dataset, the fold changes of different genes vary a lot, and the amount of gene expression change and the fraction of affected samples are the same as real data.
For your second question, we did not look at the heterogeneity of each gene because we generated synthetic datasets from real data and did not have parameters such as fold changes or fraction of affected samples. It is interesting to compare different methods for different fold changes and fractions of true positives. However, we care more about the overall performance on real datasets under our current data generation setting.
Best, Xinzhou
Hi Xinzhou,
Thank you very much for your prompt response!
I would be interested to see feedback from others, but I also want to acknowledge this response. So, I will close the "issue" with this comment, but I hope that there are other comments.
I think that it is now more clear to me why it may not be as easy to extend the idea of Supplemental Figure S19 for the other conditions (fold-change, heterogeneity, etc.). Thank you again.
I think that I will eventually have additional input, but I wasn't planning to run analysis with the semi-synthetic datasets. So, that may or may not be something that I post as a separate topic. For example, if the only additional results that I may relate to the topic of "fold-change" for cell line versus patient data, then I think that is fair as a separate topic.
Strictly speaking, I think that you could do something like only consider DEG with fold-change < 1.5-fold or < 1.3-fold, and maybe that could partially help with trying to see with what happens with more subtle differences. I apologize that I don't think I noticed it before, but it looks like the fold-change distribution for Supplemental Figure S2 already includes some relatively large fold-change values with a relatively small number of genes. However, I think perhaps the amount of questions or comments from others can help you decide if this will benefit anybody besides myself.
As I understand it, I think the strategy to define increased heterogeneity and/or differential expression within only a subset of samples in one of the groups might be more difficult. In your paper, you define a lot more genes with differences between tissues (or normal-versus-cancer), and relatively few differences with the Immunotherapy dataset. I believe that you can see that based upon either i) the differences in gene counts in Supplemental Figure S1 or ii) if you compare the red dots versus false positives for Figure 1 versus Supplemental Figure S6-Supplemental Figure S17 (especially with what I believe is a change from a linear to a logarithmic scale for the y-axis of part A. of the Supplemental Figures?). So, if there is extra heterogeneity among biologically relevant genes in GSE91061 (in that the differences between the groups may be less clear and/or less consistent), then I think potentially relevant genes might be missed if they were defined as "genes identified as DEGs by all six methods at a very small FDR threshold (0.0001%)". If there are important genes not identified as true positives, then I think that could in turn reduce the power for a conservative method (and perhaps then change the relative ranking of methods as well). Nevertheless, if I understand everything, then I think that the way that true positives are defined might have to be changed at an earlier step, and that might take relatively more effort.
Also, if I am understanding everything correctly, then I think the power for the pre- versus post- comparison might increase if you include a second variable to capture the patient differences (when allowed by the method). To be fair, I think that is separate from the question of how to quantify the heterogeneity for relevant genes (even with the method and all other settings kept constant). On one hand, if you could use a similar strategy to define the true positives with 2-variables but still calculate power with 1-variable (without capturing the pairing variable), then I am not sure how that might affect the current 1-variable results. However, on the other hand, I am also not sure how rigorously that addresses the question of what could happen if the biological true positives were in fact larger than the true positives described with the methodology in the paper.
I hope that this helps explain myself a little better. I think there should be some situations where the non-parametric Wilcoxon rank-sum performs relatively better or worse (among comparisons with relatively larger sample sizes). If possible, I think that it might be nice to see some sort of guide/guidelines; however, for me, I don't think that is extremely urgent.
Thank you again!
Sincerely, Charles
Hi,
I hope that everything is going well with all of the authors for this publication.
I think this is more of a "discussion" than an "issue," but I don't think there is a separate discussion forum and I think public discussion can be important.
This paper was selected by a lab member for a journal club presentation a couple months ago. I thought it was an interesting paper, and I have noticed some additional discussions related to this paper more recently.
However, I did have some questions / comments. In terms of what I believe is best to raise at this time, I think there are 2 points:
1) Parameter Settings to Captures Ranges in Strength of Differentially Expressed Genes
I think the paper partially addresses a question that I had in that Supplemental Figure S19 shows differences in performance with different percentages of differentially expressed genes:
I think 10% is a bit on the high side, but I think more than just the number of genes is important. For example, can you please give me a sense of the settings and/or variation in terms of the following?
1a) Amount of Gene Expression Change: for example, when a gene is differentially expressed, does the average expression vary by 1.2-fold, 2-fold, 10-fold, etc.?
1b) Fraction of Affected Samples: In cell lines, the gene expression change for genes of interest may be relatively homogenous. However, in larger sample sizes, I might expect more heterogeneity. So, in terms of the fraction of samples with changes in gene expression, are changes occurring in more like 20% of treatment/case samples or closer to 100% of treatment/case samples?
2) Definition of True Positives
In the paper, it says "We first used all six DEG identification methods to identify DEGs from each original dataset containing two conditions. We then defined true DEGs as the genes identified as DEGs by all six methods at a very small FDR threshold (0.0001%).".
Am I correctly understanding that true positives show relatively conservative expression changes?
If so, I think the paper matches my expectation that a non-parametric test like the Wilcoxon may do a good job of identifying more robust findings in larger sample sizes.
However, the part that I might not expect is the power to be as high for heterogenous data with subtle gene expression changes (such as 20% of samples within 1 group varying 1.2-fold). Do you think that is true, or might I be misunderstanding something?
Thank you very much!
Sincerely, Charles