saeyslab / CytoNorm

R library to normalize cytometry data
32 stars 6 forks source link

some markers show artefacts after normalisation #5

Open JPGranizo opened 4 years ago

JPGranizo commented 4 years ago

Hello, I've run CytoNorm on 4 files that were generated in 4 different batches. Many markers look amazing afterwards and the batch effect between samples is gone after running CytoNorm. But some markers show some unexpected artefacts (see screenshots below). Is there anything I can do to prevent this?

I noticed the following issues:

I would appreciate any advice how I could fix this, Thank you very much for your help and efforts in advance, Kind regards, Joachim

Left plots: uncorrected. Right plots: batch-corrected

image image image

SofieVG commented 4 years ago

Dear Joachim,

I would recommend double checking the clustering of the data, maybe the FlowSOM parameters could be improved. These artefacts are caused by different clusters being aligned separately (so they probably have a different marker pattern in some other markers). Lowering the number of clusters could help. Did you test the CV values?

I also recommend using the limit parameter to make sure that the algorithm tries to keep the minimum and maximum quite stable and not cause too big artefacts with extrapolation.

Kind regards, Sofie

On Mon, 17 Feb 2020 at 02:56, JPGranizo notifications@github.com wrote:

Hello, I've run CytoNorm on 4 files that were generated in 4 different batches. Many markers look amazing afterwards and the batch effect between samples is gone after running CytoNorm. But some markers show some unexpected artefacts (see screenshots below). Is there anything I can do to prevent this?

I noticed the following issues:

  • batch correction caused one positive population to be split into two
  • new random unexpected populations show up
  • lines show up in the negative population

Thank you very much, Kind regards, Joachim

Left plots: uncorrected. Right plots: batch-corrected

[image: image] https://user-images.githubusercontent.com/57099674/74618137-757eb000-5128-11ea-9a6f-0d7540f0e2dc.png [image: image] https://user-images.githubusercontent.com/57099674/74618153-84fdf900-5128-11ea-97fe-c894888bdbdd.png [image: image] https://user-images.githubusercontent.com/57099674/74618165-8d563400-5128-11ea-886e-f68618e3b59f.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/saeyslab/CytoNorm/issues/5?email_source=notifications&email_token=AAOS725GXBD2LKMX2EC5CHLRDHVGHA5CNFSM4KWI3KVKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IN4WBBQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS7272YPXYN7M2H5UGFULRDHVGHANCNFSM4KWI3KVA .

tomashhurst commented 4 years ago

@JPGranizo I have had this artefact if I have mis-matched my samples and batches. Double check that your samples line up with the correct batches in both the reference and validation files. Additionally, check that you only have ONE file per batch in your reference file dataset.

Just for some background: in our implementation of CytoNorm I had an issue where the ordering of my samples when R read them from disk was different from ordering within R (total mess, but I've now got a solution implemented in our dev branch). In most of my testing I had filenames and batches that proceed in order (i.e. batches 1, 2, 3 etc). However, because the files are sorted lexigraphically, the order becomes 1, 10, 2, 3, 4 etc for any batch numbers beyond 9 -- which can mess up the matching.

Try testing the same workflow with just two batches, and just one sample per batch in the target data -- see if you get the same effect.

It's not the only explanation (as Sofie said, it might be that the clusters are getting muddled up), but worth checking.