kundajelab / DMSO

5 stars 3 forks source link

TODO for paper #2: DPGP clustering analysis #11

Open annashcherbina opened 6 years ago

annashcherbina commented 6 years ago

Same as we did for hets. Need to run the clustering analysis for DMSO as well.

annashcherbina commented 6 years ago

DPGP clustering analysis for peaks:

Order is: Control earlyG1 Control lateG1 Control SG2M DMSO earlyG1 DMSO lateG1 DMSO SG2M

dpgp_diff_peaks_posterior_similarity_matrix_heatmap

dpgp_diff_peaks_cluster_sizes

dpgp_diff_peaks_gene_expression_fig_1

dpgp_diff_peaks_gene_expression_fig_2

DPGP clustering analysis for genes:

dpgp_diff_genes_posterior_similarity_matrix_heatmap

dpgp_diff_genes_cluster_sizes

Order is: Control earlyG1 Control lateG1 Control SG2M DMSO earlyG1 DMSO lateG1 DMSO SG2M

dpgp_diff_genes_gene_expression_fig_1 dpgp_diff_genes_gene_expression_fig_2 dpgp_diff_genes_gene_expression_fig_3 dpgp_diff_genes_gene_expression_fig_4 dpgp_diff_genes_gene_expression_fig_5

annashcherbina commented 6 years ago

More organized version of clustering figures: https://docs.google.com/presentation/d/11tsBKPbJ3LMEk1kYkqe2A_xl6FJFoUmooEcWJDWRf5s/edit#slide=id.g2a8f18cee3_0_560

akundaje commented 6 years ago

Anna,

It doesnt make sense to use DPGP clustering across different disconnected time series. You can use it separately for control and separately for treatment but not as a concatenation. DPGP has an explicit notion of time. The control and treatment are not a continuous time series.

-Anshul.

On Sat, Nov 4, 2017 at 7:42 PM, annashcherbina notifications@github.com wrote:

More organized version of clustering figures: <iframe src="https://docs.google.com/presentation/d/e/2PACX- 1vTnCBHOiPXMaoTu5hvfSxpDSxaKtWIzAhP4vADhAHhtogBChEovwXvA9qQG zW7EeM0EhY_axGnDSWCb/embed?start=false&loop=false&delayms=60000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true">

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/DMSO/issues/11#issuecomment-341945006, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EfdL4H4NWWiuQ4O7KJ4s0KlNLBC2ks5szSCPgaJpZM4QNCPA .

annashcherbina commented 6 years ago

Should we just stick with k-means clustering for DMSO analysis then? Or should DPGP be done on the fold change of DMSO vs Control over time? Really we mostly care about the DMSO vs Control effect, not the time effect within the DMSO samples or controls separately.

akundaje commented 6 years ago

Kmeans is fine

On Nov 8, 2017 3:24 PM, "annashcherbina" notifications@github.com wrote:

Should we just stick with k-means clustering for DMSO analysis then? Or should DPGP be done on the fold change of DMSO vs Control over time? Really we mostly care about the DMSO vs Control affect, not the time effect within the DMSO samples or controls separately?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/DMSO/issues/11#issuecomment-342995407, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EcOm11w4UOjSLgiMvNMqIO-w-UWnks5s0jgQgaJpZM4QNCPA .

annashcherbina commented 6 years ago

I think Sundari wants the DPGP -- I already sent her those figures, so is it valid if I re-run it on the fold change? I already have kmeans results as backup if needed.

akundaje commented 6 years ago

Dpgp cannot be used for disparate time series because it assumes contiguous temporal order across the samples as you provide them. So you should switch to kmeans.

On Nov 8, 2017 3:45 PM, "annashcherbina" notifications@github.com wrote:

I think Sundari wants the DPGP -- I already sent her those figures, so is it valid if I re-run it on the fold change? I already have kmeans results as backup if needed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/DMSO/issues/11#issuecomment-342999615, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EV_DyBC4ItGMAI8YzuGR2Q-o_Clyks5s0j0PgaJpZM4QNCPA .

annashcherbina commented 6 years ago

Yes I know, that's why I suggested the fold change of DMSO vs control. So the three timepoints would be DMSO / Control in earlyG1 DMSO / Control in lateG1 DMSO / Control in SG2M

That is a single time series, so why would that be wrong?

We basically get the same results with the DPGP & the kmeans though: https://github.com/kundajelab/DMSO/issues/7 so I can go with those, but I think the DPGP visualizations are better.

akundaje commented 6 years ago

Ah I see. I didn't realize that's what you meant. I thought you meant fold change relative to expected background for ATAC. Sure. That would be fine.

On Nov 8, 2017 3:54 PM, "annashcherbina" notifications@github.com wrote:

Yes I know, that's why I suggested the fold change of DMSO vs control. So the three timepoints would be DMSO / Control in earlyG1 DMSO / Control in lateG1 DMSO / Control in SG2M

That is a single time series, so why would that be wrong?

We basically get the same results with the DPGP & the kmeans though: #7 https://github.com/kundajelab/DMSO/issues/7 so I can go with those, but I think the DPGP visualizations are better.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/DMSO/issues/11#issuecomment-343001234, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7Ea4lF-KdnRbyt2c6Jn_BwmuFopNFks5s0j8egaJpZM4QNCPA .