nolanlab / spade

SPADE: Spanning Tree Progression of Density Normalized Events
Other
46 stars 23 forks source link

Analysis of non-cytometry data #122

Open seanam opened 8 years ago

seanam commented 8 years ago

Hi,

I'm interested in using SPADE for non-cytometry data and have followed the recommendations in the FAQ. I have a few questions about the process:

  1. Is it possible to perform SPADE on small datasets (100-1000 samples)?
  2. How do you turn off downsampling (rather than set it to 0.9)?
  3. What does CLUSTERING_SAMPLES represent and how does it relate to TARGET_CLUSTERS?
  4. Is it possible to upload an R dataframe or table rather than .fcs (i.e. bypass fcs.R)?

Thanks, Sean

SamGG commented 8 years ago

Hi, As a simple user from time to time I will give you a quick feedback and others may answer more in detail. I think that it will be easier for us to understand if you tell us what your data are. In FCM, a sample is usually contained FCS and consists in many events recorded in many dimensions, the fluorescent markers. That a sample is a matrix with columns being markers aka dimensions, and row are events. Depending on the population they belong to, events aka cells could be numerous or rare. That is why the down-sampling is important, especially because SPADE's one aims to lead to homogeneous sampled populations. Being over-simplistic, the remaining process is based on clustering.

  1. Do you mean samples as rows or columns of the matrix?
  2. If you disable down-sampling, you will loose most of the benefits of SPADE.
  3. TARGET_CLUSTERS is the number of expected groups of events, usually an over-estimation
  4. The implementation works on FCS files, not matrices.

HTH.

zbjornson commented 8 years ago

Hi Sean,

  1. As @SamGG asks, by "samples" does that mean rows (observations/cells/events) in your matrix, or number of matrices? 100-1000 matrices is a large dataset but as long as your computer has sufficient resources it should be fine. 100-1000 rows is small, and you would probably do well to disable downsampling as you seem to want to do.
  2. I think you could try downsampling_target_percent=1.0, but I actually haven't tried this. For technical reasons you might need to try 0.99 or something slightly less than 1.0. (Internally some of our lab members have clustered data without downsampling, and the biggest issue is usually availability of system resources/time, but otherwise it can look nice.)
  3. CLUSTERING_SAMPLES is the number of events that will be randomly selected after the density-dependent downsampling and is there to avoid swamping your computer. It can be greater than the number of rows you start with. TARGET_CLUSTERS is the target number of nodes.
  4. You'd have to do some coding for this. SPADE.driver is the main entry point and would be the code to work from if you want to try this. Otherwise I'd recommend exporting your dataframes to FCS format first.
dm319 commented 8 years ago

Might want to try csvtofcs which should be able to convert a dataframe into fcs.

seanam commented 8 years ago

Thanks for your replies! Your input has been very helpful. @SamGG: my dataset consists of fluorescent measurements acquired via microscopy. There are approximately 10 markers (dimensions) and 100 events (rows) so its a pretty small dataset for now.

  1. I should have said rows instead of samples (still learning fcs notation)
  2. Thanks for your comments on the importance of down-sampling. It seems like for now I'll just be able to do clustering due to my dataset size.

@zbjornson:

  1. (see above)
  2. I tried to do downsampling_target_percent=1.0 but got the following error:

Error in if (nrow(tbl) > 60000) { : argument is of length zero

Not that important since I can use 0.99 but seems like setting it to 1.0 doesn't work.

  1. Thanks for the explanation, this helped solve the errors I was getting.
  2. I'll look into this more. I'm building a Shiny app and would like to have SPADE integrated without having to export the dataframe and then import it as a .fcs file.

I am getting this error now:

Producing tables... Error in rownames(pivot) : object 'pivot' not found

Any idea what is happening?

@dm319: Thanks for the tip! This works better than the other conversion method I was using.

SamGG commented 8 years ago

IMHO, you should better do a simple hierarchical clustering, multi-dimensional scale or tSNE than trying to fit your data into SPADE, especially if you don't apply down-sampling. Best.

zbjornson commented 8 years ago

Re: object 'pivot' not found, that's from https://github.com/nolanlab/spade/blob/master/R/driver.R#L256 and it looks like that would happen if there were no .anno.Rsave files produced (not sure why that would happen). Try commenting out lines 256 through 265; it will remove just one of the three transpositions of the statistics tables.

Re: @SamGG's comment -- without downsampling, SPADE's clustering is plain hierarchical clustering (followed by the MST and layout calculations). There's no harm in using SPADE for that, but it might be simpler to use the underlying clustering module (Rclusterpp) directly. Rclusterpp, in turn, is a faster replacement for the built-in hclust function.