nolanlab / spade

SPADE: Spanning Tree Progression of Density Normalized Events
Other
46 stars 23 forks source link

Only 3 clustering markers? #109

Closed skyng22003 closed 8 years ago

skyng22003 commented 9 years ago

Hi I've selected the folder with my FCS files but there are only 3 clustering markers FSC-A, SSC-A and Time. I cannot see any of my fluorescent channels?

Windows 8.1 x64 Java 7 x64 Cytoscape v2.8.3v x64

zbjornson commented 9 years ago

This could be caused by having different channels in each file (in which case you would see only the intersection of channels between the files) or possibly from files with malformed headers (e.g. missing the parameter names).

SamGG commented 9 years ago

Hi, I think you should try to open one of your FCS file with R/BioConductor, if you have a little experience with R. Alternatively, tell us where your file came from and share one of those files with us.

zbjornson commented 9 years ago

To clarify my previous comment, this is most likely the intended behavior; you can only cluster on markers that are shared between all the files. There are, however, possibly other bugs/artifacts that could cause this.

skyng22003 commented 9 years ago

Hi, I have managed to solve the problem by exporting FCS files first opened in FlowJo possibly an issue with the new version of DIVA? However I have ran into another problem, so I generated the R script using Cytoscape and appended the script with the version 2.6 arguments, if I try to run the R script generated using Cytoscape directly I get the driver SPADE.driver issue. Replacing with version 2.6 solved the SPADE.driver issue but I now have an new error with the v2.6 arguments

Downsampling file: E:/DE8 FACS/80x1_4818.fcs Error in exprs(ff) : error in evaluating the argument 'object' in selecting a method for function 'exprs': Error: The following parameters in the spillover matrix are not present in the flowFrame: V450/50-A, R780/60_735LP-A

Any ideas?

This is the R script

#!/usr/bin/env Rscript
# ^^set this to your Rscript path
#
# runSPADE:  R wrapper script for SPADE tree construction
# Erin Simonds - esimonds@stanford.edu
# Version 2.6 - September 24, 2013
#
# Command line instructions:
#   1) Make sure the first line of this file is your Rscript path (found in the same directory as R)
#
#   2) Customize the parameters that are CAPITALIZED below 
#
#   3) In a command shell, navigate to the folder containing this script and the FCS file(s) to be analyzed
#
#   4) Make this script executable.  At the command line, run:
#   $ chmod +x runSPADE.R
#
#   5a) For normal use (not in a load sharing or compute cluster): At the command line, run:
#   $ ./runSPADE.R [-num_threads=X] [-file_to_process=Y]
#   where X is the number of threads you wish to use (default = 1)
#   Y is the name of the file to process (default = use file(s) specified in this script)
#       Note that parameters in brackets may be omitted.
#
#   5b) For Sun Gridengine: At the command line, run:
#   $ qsub -cwd -j y -b y -m e -M username@domain.ext [-pe threaded A] ./runSPADE.R [-num_threads=X] [-file_to_process=Y]
#   where username@domain.ext is your e-mail address to e-mail when the job is done,
#   A is the number of slots to reserve with Gridengine,
#   X is the number of threads to use in SPADE (usually the same as A) (default = 1).
#   Y is the name of the file to process (default = use file(s) specified in this script)
#       Note that parameters in brackets may be omitted.
#
#   5c) For Platform LSF: At the command line, run:
#   $ bsub [-n A -R "span[hosts=1]"] ./runSPADE.R [-num_threads=X] [-file_to_process=Y]
#   A is the number of slots to reserve with LSF,
#   X is the number of threads to use in SPADE (usually the same as A) (default = 1).
#   Y is the name of the file to process (default = use file(s) specified in this script)
#       Note that parameters in brackets may be omitted. (Note that the brackets surrounding "hosts=1" do not indicate an optional parameter.)
#
# Interactive instructions:
#   1) Customize the parameters that are CAPITALIZED below 
#   2) In an interactive R session, change the working directory to a folder containing this script and the FCS file to be analyzed
#   3) At the R command line, run: source("runSPADE.R")
#
#

# Set this to the FCS file or folder to be processed.  If a folder is specified, all FCS files will be processed.
FILE_TO_PROCESS="E:/DE8 FACS/"

# Set this to the reagent names ($P*S keywords from the FCS file) to be used for clustering and tree construction.  Usually these are the surface markers.
CLUSTERING_MARKERS=c("APC-A","PE-A","PE-Cy7-A","PerCP-Cy5-5-A","V450_50-A")

# Set this to panels. For example, a panel may be one patient with their stimulated and unstimulated samples.
PANELS=list(
list(
panel_files=c("80x1_4818.fcs","80x1_4819.fcs","80x1_4839.fcs","80x1_4841.fcs","80x2_4788.fcs","80x2_4789.fcs","80x2_4790.fcs","80x2_4817.fcs","PBS_4Y_4696.fcs","PBS_4Y_4697.fcs","PBS_4Y_4698.fcs","PBS_4Y_4699.fcs","PBS_PBS_4786.fcs","PBS_PBS_4797.fcs","PBS_PBS_4814.fcs","PBS_PBS_4816.fcs"),
median_cols=NULL,
reference_files=NULL,
fold_cols=NULL
)
)

# Set this to the path to your local spade package installation -- if using a global installation, set to NULL.
LIBRARY_PATH="lib/"

# Set this to the transforms for each channel
TRANSFORMS=flowCore::arcsinhTransform(a=0, b=0.2)

# Pick one of these and set the rest to NULL.
DOWNSAMPLING_TARGET_NUMBER=NULL
DOWNSAMPLING_TARGET_PCTILE=NULL
DOWNSAMPLING_TARGET_PERCENT=0.1 # 10% -- recommended.

# Set this to the desired number of events to use for clustering.  Recommended:  50000
CLUSTERING_SAMPLES=50000

# Set this to the desired threshold for outliers (unit is the local density for each cell).  A higher value will discard more events.    Recommended:  0.01
DOWNSAMPLING_EXCLUDE_PCTILE=0.01

# Set this to the target number of clusters.  Algorithm will create clusters within 50% of this value.  Recommended:  200
TARGET_CLUSTERS=200

# This is a linear multiplier for node size.
NODE_SIZE_SCALE_FACTOR=1.2

# Path to output directory.  Recommended:  "output/"
OUTPUT_DIR="E:/DE8 FACS/output/"

# Path to temporary directory.  Recommended:  "/tmp/"
TMPDIR="E:/DE8 FACS/tmp/"

# Set this to the desired layout function.  Recommended:  SPADE.layout.arch
# Option 1:  SPADE.layout.arch  ...this is fast, but allows overlapping nodes and edges
# Option 2:  SPADE.layout.arch_layout  ...this is slow, but prevents overlapping nodes and edges
LAYOUT_FUNCTION=SPADE.layout.arch

###################  No need to modify anything below this point ########################

#Run SPADE analysis workflow

#Default value:
NUM_THREADS <- 1

for (e in commandArgs()) {
    ta <- strsplit(e,"=",fixed=TRUE)
    if( ta[[1]][1] == "-num_threads") {
        NUM_THREADS <- ta[[1]][2]
    }
    if( ta[[1]][1] == "-file_to_process") {
        FILE_TO_PROCESS <- ta[[1]][2]
    }
}

Sys.setenv("OMP_NUM_THREADS"=NUM_THREADS)

library("spade",lib.loc=LIBRARY_PATH)

SPADE.driver(FILE_TO_PROCESS, file_pattern="*.fcs", out_dir=OUTPUT_DIR, cluster_cols=CLUSTERING_MARKERS, panels=PANELS, transforms=TRANSFORMS, layout=LAYOUT_FUNCTION, downsampling_target_percent=DOWNSAMPLING_TARGET_PERCENT, downsampling_target_number=DOWNSAMPLING_TARGET_NUMBER, downsampling_target_pctile=DOWNSAMPLING_TARGET_PCTILE, downsampling_exclude_pctile=DOWNSAMPLING_EXCLUDE_PCTILE, k=TARGET_CLUSTERS, clustering_samples=CLUSTERING_SAMPLES)

LAYOUT_TABLE <- read.table(paste(OUTPUT_DIR,"layout.table",sep=""))
MST_GRAPH <- read.graph(paste(OUTPUT_DIR,"mst.gml",sep=""),format="gml")
SPADE.plot.trees(MST_GRAPH,OUTPUT_DIR,file_pattern="*fcs*Rsave",layout=as.matrix(LAYOUT_TABLE),out_dir=paste(OUTPUT_DIR,"pdf",sep=""))

Sys.unsetenv("OMP_NUM_THREADS")

Many thanks

SamGG commented 9 years ago

Your manual "pipeline" is not very clear to me. Nevertheless, I observed that FlowJo (9.x on Mac) encompass channel name of compensated channels with <>. Such a name might not be found in the compensation matrix (aka SPILLOVER). May be... SPADE is depending on BioConductor. You should definitively try to open at least one FCS using commands within R.

skyng22003 commented 9 years ago

Hi, sorry if my description confused you, I tried using R to open and FCS file, the channel name looks normal ie without the "<>".

print(summary(fcs)) FSC-A SSC-A FITC-A PE-A PerCP-Cy5-5-A PE-Cy7-A V450_50-A APC-A Min. 5056 4480 1.029 1.004 0.9989 0.9952 1.001 0.9993 1st Qu. 22270 14020 1.122 1.034 1.0780 1.0490 1.020 1.0130 Median 27970 17920 1.196 1.051 1.1150 1.0810 1.035 1.0200 Mean 31800 23650 1.246 1.069 1.2780 1.1360 1.050 1.0670 3rd Qu. 35070 25760 1.304 1.086 1.1870 1.1310 1.054 1.0340 Max. 262100 262100 3.097 1.903 34.3800 5.4330 5.773 82.9600 R780_60_735LP-A Time Event # Min. 0.9995 0.56 4 1st Qu. 1.0060 26.96 16860 Median 1.0090 56.16 32770 Mean 1.0120 56.69 24870 3rd Qu. 1.0140 83.76 32770

skyng22003 commented 9 years ago

Can you spot anything wrong with the R script I posted before?

skyng22003 commented 9 years ago

80x1_4818

Here's the FCS file btw, just change the extension.

SamGG commented 9 years ago

A problem concerns the channel names. See below.

theFCS = read.FCS("80x1_4818.fcs", transformation = FALSE)
# I prefer to keep data untransformed as in the original acquisition software
theSpill = spillover(theFCS)$SPILL
round(t(theSpill), 2)  # t() to show channels in lines
# The problem: some channel are unrecognized
compensate(theFCS, theSpill)
colnames(theSpill) %in% featureNames(theFCS)
# The solution
newSpill = spillover(theFCS)$SPILL  # copy
colnames(newSpill) = gsub("/", "_", colnames(newSpill))  # convert names
colnames(newSpill) %in% featureNames(theFCS)
round(t(newSpill), 2)  # t() to show channels in lines
# spillover is not writable directly, as far as I know
theKwds = keyword(theFCS)  # get full keywords
theKwds$SPILL = newSpill  # replace spillover
keyword(theFCS) = theKwds  # replace all keywords
# Verify
spillover(theFCS)  #  correct
proc = compensate(theFCS, spillover(theFCS)$SPILL)  # apply compensation
plot.default(exprs(proc)[,6], exprs(theFCS)[,6])  # most compensated channel
plot.default(asinh(exprs(proc)[,6]/500), asinh(exprs(theFCS)[,6]/500))  # better view in my sense
# do some plots to verify the asinh coefficient of your script
# the value is 0.2, which is too low IMHO
SamGG commented 9 years ago

You can transform all your files: COPY them in a new directory, copy and run the following code. Then run SPADE. Again verify the transformation applied to the data. May be you should ask Zach.

# Here is the transformation your files
myFiles = dir(pattern = "\\.fcs$")
for(theFile in myFiles) {
  # Read
  theFCS = read.FCS(theFile, transformation = FALSE)
  theSpill = spillover(theFCS)$SPILL
  # Check channels vs spillover
  if (all(colnames(theSpill) %in% featureNames(theFCS))) next
  message(sprintf("Converting %s", theFile))
  # Copy, convert, write
  newSpill = spillover(theFCS)$SPILL  # copy
  colnames(newSpill) = gsub("/", "_", colnames(newSpill))  # convert names
  colnames(newSpill) %in% featureNames(theFCS)
  theKwds = keyword(theFCS)  # get full keywords
  theKwds$SPILL = newSpill  # replace spillover
  keyword(theFCS) = theKwds  # replace all keywords
  # Comment following line if you want to overwrite the file
  newFile = gsub("\\.fcs", "-mod.fcs", theFile)  # -mod suffix
  # Store as FCS, stated as experimental but usually OK
  write.FCS(theFCS, newFile)
}
SamGG commented 9 years ago

Could you please modify the code you gave in order to format it as code (not as Markdown). See Markdown https://guides.github.com/features/mastering-markdown/ for javascript or https://help.github.com/articles/markdown-basics/#code-formatting.

zbjornson commented 9 years ago

(I edited the comment to put backticks in.)

Indeed the error you got ("The following parameters in the spillover matrix are not present in the flowFrame...") are because you only exported compensated channels, but the matrix still contains the uncomped channels.

Is the FCS file you posted from DiVa or FlowJo? (It looks like it's the original DiVa file.)

I'm not sure why opening with FlowJo appears to fix the problem; rather, can you please confirm that you re-exported your files from FlowJo?

Can you please also confirm that every file has the same channels in it? Either use R to print the channels lists or open them all in FlowJo and make sure you can see exactly the same channel list for every file.

SamGG commented 9 years ago

Ooops, I didn't make it clear: the goal of my code was to avoid the slash in the name of the channel. But may be there are other problems.

zbjornson commented 9 years ago

The files have a | char as the delimiter, so slashes won't hurt.

SamGG commented 9 years ago

Well, the spillover matrix contains names that don't match the parameters. Then an error is raised. When slashes are replaced, no error is raised during compensation. I have already faced such odd behavior within Cytobank. But may be the truth is elsewhere ;-) I agree that the file handling needs to be clarified.

skyng22003 commented 9 years ago

Hi regarding your question about the FCS files, yes they were exported from flowJo and I am sure all of the files have the same channels in flowjo.

skyng22003 commented 9 years ago

Regarding SamGG comment, is it possible that the channel options makes a difference to the FCS files? ie its possible to get flowjo to show the non-compensated data(as a channel option) or not I wonder if this changes the FCS files when it gets exported.

skyng22003 commented 9 years ago

I'll test out the code above and see how it goes, many thanks again for both of your help!

skyng22003 commented 9 years ago

Hi, I ran the code above worked like a charm, but I get an error at the end? After the upsampling?

Error in .Call("R_igraph_get_shortest_paths", graph, as.igraph.vs(graph, : At iterators.c:759 : Cannot create iterator, invalid vertex id, Invalid vertex id

SamGG commented 9 years ago

Great if the code helped you to input your FCS. But no idea about the last error. Concerning the export, I think that if the export is FCS v3, raw data plus compensation matrix are exported, leading to compensated data once compensation is applied. If the export is FCS 2, the data are compensated before export and the matrix is zero and one on the diagonal. We experienced the latter case in FJ 9.2 or 9.3. What FJ offers at visualization should not impact the export.

skyng22003 commented 9 years ago

I exported everything as FCS v3, so if I understood you correctly exporting as FCS v2 would solve the channel naming problem without the code?

SamGG commented 9 years ago

No channel naming is independent from compensation IMHO. My previous comment concerns the data and the matrix. The problem with channel name is that special characters are not welcome. I think we should stick on alphabetic and numerical characters separated by underscores, dot and may be minus (but R does not welcome minus in name). I will advise you to stay with FCS 3 format.

zbjornson commented 8 years ago

Closing due to age. Please feel free to reopen if you have any more questions.

SamGG commented 8 years ago

OK for me.