Interpretation of the results

yeroslaviz commented 1 year ago

First let me say, that I really like the tool and the plots it creates are very good for the analysis.

A college of mine and I have looked at the results and tried to understand what each of them tell us.

For now I have three questions, we hope you can maybe explain to us, as we couldn;t find it neither in the scripts nor in the results.

the plots with the x-axis specifying "sgRNA passing filter per gene" - what guides are those? what filter is being used here? I couldn't find any filtering step in the process.
What is the value used at the end in the volcano plots? When I look at the table of gene (*_genetable.txt) from the workflow, I see for each of the three comparisons the average of the three strongest guides as well as the Mann-Whitney p-values for each of the replicas followed by some average of the whole three. This step I don't understand as it doesn't fit to the values. Below is one of the lines as an example. The three averages fits the strongest absolute values for each replicate (0.020.., 0.0671.., -0.25...), but the average of averages (-0.065...) does not. Can you explain to me how this value is being calculated? We assume, these are the values, that are being also plotted into the volcano plot. Is that correct?
In you tutorial you show a volcano plot with gene names, but I can't fgure out, how to create them with my data. Can you explain to us please what we need in order to get the gene names into the plot? Is is only possible with in the interactive mode?

I appreciate your help and time

Thanks in advance

Assa

The phenotype results output for guides of one gene

    gamma   gamma   gamma
    Rep1    Rep2    Rep3
sgID            
KO-1    -0.121996393    0.249498771 -0.295025115
KO-2    0.308514859 0.043902946 -0.216134713
KO-3    -0.124483175    0.243596097 -0.084025057
KO-4    -0.02571983 -0.291532662    -0.243740376

The gene table results for this gene:

    gamma   gamma   gamma   gamma   gamma   gamma   gamma   gamma
    Rep1    Rep1    Rep2    Rep2    Rep3    Rep3    ave_Rep1_Rep2_Rep3  ave_Rep1_Rep2_Rep3
    Mann-Whitney p-value    average phenotype of strongest 3    Mann-Whitney p-value    average phenotype of strongest 3    Mann-Whitney p-value    average phenotype of strongest 3    Mann-Whitney p-value    average phenotype of strongest 3
gene                                
KO  0.490386221 0.02067843  0.263660533 0.067187402 0.034289636 -0.251633401    0.460964364 -0.065803612

mhorlbeck commented 1 year ago

Hi Assa,

I'm glad the tool has been helpful!

The filter comes in when converting from counts to phenotypes. The default is set in the experiment config file: `#############################################################

Filter Settings

############################################################# [filter_settings]

Do you require greater than or equal to the minimum reads

for both experiments in a comparison or either experiment?

Default is either, other option is both

filter_type = either minimum_reads = 50 ` You can adjust this however you want, or even set to 0 to not filter. Others have worked in more robust shot-noise algorithms into this pipeline (such as the MAGeCK mean-variance modeling) but this simple filter has worked well enough for many cases.
Averaging happens row-wise at the sgRNA level (i.e. all three replicates of sgRNA KO-1 averaged together) and then the average of top 3 is done column-wise (so based on the phenotype-table column gamma, ave_Rep1_Rep2_Rep3). Does that make sense?
Based on some feedback/PRs I recently changed the default to not show gene hits on the plot. You can get them in interactive mode or change the function signature of screen_analysis.py to: def volcanoPlot(data, phenotype=None, replicate=None, transcripts=False, showPseudo=True, effectSizeLabel=None, pvalueLabel=None, hitThreshold=7, labelHits = **True**, ...

Hope this helps, Max

yeroslaviz commented 2 months ago

thank you for the reply

mhorlbeck / ScreenProcessing

Interpretation of the results #25

Filter Settings

Do you require greater than or equal to the minimum reads

for both experiments in a comparison or either experiment?

Default is either, other option is both