saeyslab / CytoNorm

R library to normalize cytometry data
33 stars 6 forks source link

Could CytoNorm impede discovery of novel clusters? #10

Open vivek-verma202 opened 4 years ago

vivek-verma202 commented 4 years ago

Hi! If the training set is from healthy controls and the hypothesis is to discover novel clusters (using Diffcyt) that occur only in cases but not in controls, could CytoNorm pre-processing wash-off the signal?

Thanks, Vivek

tomashhurst commented 4 years ago

@vivek-verma202 it doesn't have to, but it's a bit complicated. In our implementation in Spectre we generate clusters using only stable markers to get the major population groups (e.g. Ly6G, CD19, CD4, CD3 etc in mouse). Essentially when thinking about cluster-specific batch effects, this probably happens at the level of fundamental biological groups. E.g. most T cells would likely have a similar batch effects, whereas the batch effects on T cells might be very different to those on eosinophils. So we are only trying to cluster eosinophils, neutrophils, monocytes, T cells, NK cells, and B cells for alignment. If the alignment is done using only these large population groups, then when the actual analysis gets done afterwards, you can cluster on all markers and still find novel clusters.

Now the specific issue you raised is a good point -- if the healthy controls are being used as the reference samples then the alignment of markers might get messed up, as some disease samples might have high levels of some markers that aren't present in the healthy controls. One of the requirements of CytoNorm specified in the document is that the reference control needs to span the full range of the data. The implication is that, in an ideal world, you could use one of the 'disease' sample etc so that all the activation markers etc will be present. However, in practice this is difficult to do. The way we have been getting around this is, as above, use just stable markers to create clusters for alignment, and then we keep both the raw and aligned data in our dataset. Then in our analysis proper, we cluster on all the markers where we know the distribution is fairly similar between healthy and diseased (CD11b etc), and then look for novel patterns/bifurcations in each cluster that are generated by the raw data for activation/novel markers (CD80/CD86 etc). There are essentially two ways of using clustering: one is to cluster on everything and find new clusters 'appearing' in experimental groups, or cluster on stable markers and then ask how each of those stable clusters have changed between experimental groups -- the approach described above is the later.

vivek-verma202 commented 4 years ago

@tomashhurst , thank you for your response! I have 2 follow-up questions:

  1. if I use only stable markers to cluster for alignment, would the intensities of the unused channels will also be corrected for batch effects? For instance, if I use CD56 and CD16 as stable markers to cluster for alignment, would subsequent results, say, expression of NKp46 on NK cells of patients and controls is different, would still be devoid of any batch effects?

  2. Would you recommend, univariate signal alignment (like fdaNorm) as a preprocessing step to de-noise the data prior to using CytoNorm?

emmanuelaaaaa commented 4 years ago

Hello,

I have been having similar questions, so thanks @tomashhurst for the input. However, as far as I understand, having only used some markers for the training (the "stable" ones), then only those markers will be normalised and the rest will not even appear on the normalised fcs samples. So in your analyses, do you just append the rest of the markers (the non normalised ones) on the normalised fcs files? Also if you are suggesting to only use some markers for the normalisation, does that mean that you tried using all of them but the normalisation didn't work as well?

Thanks again for your time. Best, Emma

SofieVG commented 4 years ago

Hi Emma and Vivek,

It is possible to define a different set of markers for the clustering as for the normalization, by adding colsToUse to the FlowSOM.params. As such, you can only use the "stable" ones for clustering, but still normalise all markers.

I hope this helps, Kind regards, Sofie

On Tue, 14 Jul 2020 at 12:52, Emma notifications@github.com wrote:

Hello,

I have been having similar questions, so thanks @tomashhurst https://github.com/tomashhurst for the input. However, as far as I understand, having only used some markers for the training (the "stable" ones), then only those markers will be normalised and the rest will not even appear on the normalised fcs samples. So in your analyses, do you just append the rest of the markers (the non normalised ones) on the normalised fcs files? Also if you are suggesting to only use some markers for the normalisation, does that mean that you tried using all of them but the normalisation didn't work as well?

Thanks again for your time. Best, Emma

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/saeyslab/CytoNorm/issues/10#issuecomment-658112691, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS722MBBFUEHJOZ2GKWYTR3Q2G5ANCNFSM4NYUHUVA .

emmanuelaaaaa commented 4 years ago

Hi Sofie,

Thank you so much for the quick reply. That's very helpful! Should we also use a different transformList for each step as well then (with the "stable" channels for the prepareFlowSOM and all the channels for CytoNorm.train)?

Best, Emma

SofieVG commented 4 years ago

You can keep the same transformList, the whole flowFrame will be transformed but only those channels of interest will actually be used in the FlowSOM computations. It does not matter that some columns get transformed while not being used in the computation.

On Tue, 14 Jul 2020 at 13:32, Emma notifications@github.com wrote:

Hi Sofie,

Thank you so much for the quick reply. That's very helpful! Should we also use a different transformList for each step as well then (with the "stable" channels for the prepareFlowSOM and all the channels for CytoNorm.train)?

Best, Emma

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/saeyslab/CytoNorm/issues/10#issuecomment-658128937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS725TH3DQUBUVXJVYVFDR3Q65TANCNFSM4NYUHUVA .

emmanuelaaaaa commented 4 years ago

Thanks! I know I'm going a bit off topic here, but if you could please elaborate on what the transformList is actually doing as well I would be grateful. It's not completely clear to me from the documentation, is it an arcsinh transformation before you do the normalisation and then you return the values to their original (non transformed) range before you export to fcs?

SofieVG commented 4 years ago

Exactly. We work on the arcsinh transformed data to compute the clusters and interpolate the normalisation values, and afterwards reverse the transformation to have "raw" values in the fcs files again (as this is in general assumed to be raw data, e.g. by other software). You can of course also specify another transformation list (e.g. if you would be working on flow cytometry data instead of mass) or work with pre-transformed files and set the transformationList to NULL.

On Tue, 14 Jul 2020 at 13:57, Emma notifications@github.com wrote:

Thanks! I know I'm going a bit off topic here, but if you could please elaborate on what the transformList is actually doing as well I would be grateful. It's not completely clear to me from the documentation, is it an arcsinh transformation before you do the normalisation and then you return the values to their original (non transformed) range before you export to fcs?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/saeyslab/CytoNorm/issues/10#issuecomment-658138424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS72254W6L46JAU4KC5SLR3RB3VANCNFSM4NYUHUVA .