Save a tree scaffold/template for future use

mleipold commented 13 years ago

Here at the Stanford HIMC, we have a few multi-year customer studies. We would get the 2009 samples, process them, then repeat each year as the 2010, 2011, 2012, etc samples come in.

Is there currently a way to build a tree with a given panel of markers starting with, say, the 2009 samples, and as each year's samples comes in, perform SPADE analysis on them using the same tree scaffold/template of the 2009 samples?

Currently, the only way I know how to make everything appear on identically-framed trees is to include all the FCS files in the same tree-building exercise. This would mean that each year, we would have to rebuild our tree. While the trees built on the same markers should be similar, they would not necessarily be identical each time the analysis is rerun.

Kinda like how in FlowJo/etc, you build a template of your analysis, and can just keep dragging new FCS files in (though in the case of SPADE, I obviously wouldn't be asking to readjust gates).

mlinderm commented 13 years ago

Hi Mike,

Yes. The short answer is that you start the SPADE pipeline post clustering and just run the up-sampling, plot generation, etc. There is no easy way to do so though without writing some R code. I think I might have something close lying around that I can adapt for this purpose. Give me a few days...

esimonds commented 13 years ago

Good question. This is certainly possible, and in fact, this is essentially what SPADE does when it performs the "upsampling" step. Currently there is no handy button in the interface to re-analyze new data against an old tree, but it is currently possible using a series of commands in the R console (or in a script). It's been a while since I've done this myself, so when you urgently need to do it, let me know and we'll go through it together. We'll post the resulting script here on Github for all to enjoy.

Erin

On Mon, Dec 5, 2011 at 1:38 PM, mleipold < reply@reply.github.com

wrote:

Here at the Stanford HIMC, we have a few multi-year customer studies. We would get the 2009 samples, process them, then repeat each year as the 2010, 2011, 2012, etc samples come in.

Is there currently a way to build a tree with a given panel of markers starting with, say, the 2009 samples, and as each year's samples comes in, perform SPADE analysis on them using the same tree scaffold/template of the 2009 samples?

Currently, the only way I know how to make everything appear on identically-framed trees is to include all the FCS files in the same tree-building exercise. This would mean that each year, we would have to rebuild our tree. While the trees built on the same markers should be similar, they would not necessarily be identical each time the analysis is rerun.

Kinda like how in FlowJo/etc, you build a template of your analysis, and can just keep dragging new FCS files in (though in the case of SPADE, I obviously wouldn't be asking to readjust gates).

Reply to this email directly or view it on GitHub: https://github.com/nolanlab/spade/issues/18

mlinderm commented 12 years ago

I prepared a script for up sampling additional files in the context of previous spade runs... It is available as a gist. It is essentially the upampling, median computation and other components from SPADE.driver extracted as a stand-alone R script.

It expects a specifically prepared directory. The example I used is:

$ tree .
.
├── 20071001-u937.002.fcs
├── output
│   ├── clusters.fcs
│   ├── clusters.table
│   ├── layout.table
│   └── mst.gml
├── runSPADE.R
└── upsample.R

You will need to modify the upsample.R script with information from your original runSPADE.R, e.g., clustering markers, you will also need to list out the files you want processed in the panels listing.

You will need to prepare the output directory above, copying the files shown from the original SPADE run. These files include the clustering assignment information needed to upsample, that is assign clusters, in the new files.

With that all in place you can then run the upsample.R script and it should upsample the new FCS files, 20071001-u937.002.fcs in this case, compute medians, create PDFs, etc.

The result will look something like:


$ tree -L 2 .
.
├── 20071001-u937.002.fcs
├── output
│   ├── 20071001-u937.002.fcs.density.fcs.cluster.fcs
│   ├── 20071001-u937.002.fcs.density.fcs.cluster.fcs.anno.Rsave
│   ├── 20071001-u937.002.fcs.density.fcs.cluster.fcs.medians.gml
│   ├── clusters.fcs
│   ├── clusters.table
│   ├── global_boundaries.table
│   ├── layout.table
│   ├── mst.gml
│   └── pdf
├── runSPADE.R
└── upsample.R

mleipold commented 12 years ago

Is this something that is going to be implemented as an "Add new FCS file to existing tree" button or dropdown option in the future?

esimonds commented 12 years ago

That's not planned for the Cytoscape interface, but I think it should be added to the feature request list for the Cytobank implementation.

On Wed, Apr 4, 2012 at 1:30 PM, mleipold < reply@reply.github.com

wrote:

Is this something that is going to be implemented as an "Add new FCS file to existing tree" button or dropdown option in the future?

Reply to this email directly or view it on GitHub: https://github.com/nolanlab/spade/issues/18#issuecomment-4962318

AlixDahirel commented 12 years ago

Hi,

As I'm facing the same issue (want to analyze new data against an old tree), I tried to use the upload script generated by mleipold. I set up the directory as indicated and modified the upsample.R with the name of the new data files as well as the clustering markers. Then I run the script in R. I obtained the following error message and being new in R don't know how to fix it:

Computing medians for file: Erreur dans apply(mat, 2, tform) : dim(X) must have a positive length De plus : Message d'avis : In SPADE.markerMedians(f, igraph:::vcount(graph), cols = p$median_cols, : arcsinh_cofactor is deprecated, use transform=flowCore::arcsinhTransform(...) instead

Compute the global limits (cleaning up attribute names to match those in GML files)

attr_ranges <- t(sapply(attr_values, function(x) { quantile(x, probs=c(0.00, pctile_color, 1.00), na.rm=TRUE) })) rownames(attr_ranges) <- sapply(rownames(attrranges), function(x) { gsub("[^A-Za-z0-9]","",x) }) write.table(attr_ranges, paste(out_dir,"global_boundaries.table",sep=""), col.names=FALSE) Erreur dans file(file, ifelse(append, "a", "w")) : impossible d'ouvrir la connexion De plus : Message d'avis : In file(file, ifelse(append, "a", "w")) : impossible d'ouvrir le fichier 'output/global_boundaries.table' : No such file or directory

SPADE.plot.trees(graph,out_dir,file_pattern="_fcs_Rsave",layout=as.matrix(layout_table),out_dir=paste(out_dir,"pdf",sep=.Platform$file),size_scale_factor=NODE_SIZE_SCALE_FACTOR) Erreur dans SPADE.plot.trees(graph, out_dir, file_pattern = "_fcs_Rsave", : Not a graph object

The only thing I can say is that the 'global_boundaries.table' file is in the output directory.

nolanlab / spade

Save a tree scaffold/template for future use #18

Compute the global limits (cleaning up attribute names to match those in GML files)