paiva-s-lab / FlowCT

FlowCT: A semi-automated workflow for deconvolution of immunophenotypic data and objective reporting on large datasets
4 stars 4 forks source link

Question about FlowCT (generating SCE object) #16

Closed jfoedfjwofa closed 1 week ago

jfoedfjwofa commented 1 month ago

I'm sorry for starting this thread without permission. I'm hoping to analyze a large FCM dataset using FlowCT. Your step-by-step tutorial really helps me, thank you so much.

Now I'm trying to analyze my data in practice, but facing one problem at the beginning. When I tried to generate an "SCE" object, the following error occurred.

error #000: in H5Pset_chunk(): line 2004
        major: Invalid arguments to routine
        minor: Out of range
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'X' in selecting a method for function 'lapply': error in evaluating the argument 'object' in selecting a method for function 'sampleNames': hdf Error

The command I executed was the following;

library(FlowCT)
setwd("xxx")
unify.FCSheaders(directory = "xxx", pattern = "fcs", fix = F)
filenames <- list.files(pattern = "fcs", path ="xxx")md <- data.frame(filename = filenames, 
                 sample_id = 1:length(filenames),
                 patient_id = sapply(filenames,function(x) strsplit(x, split = "_|\\.")[[1]][3]),
                 timepoints = sapply(filenames,function(x) strsplit(x, split = "_|\\.")[[1]][5]),
                 response=sapply(filenames,function(x) strsplit(x, split = "_|\\.")[[1]][7]))

md$patient_id<-as.factor(md$patient_id)
md$timepoints<-as.factor(md$timepoints)
md$response<-as.factor(md$response)

fcs <- fcs.SCE(directory = "xxx", pattern = "fcs", metadata = md, events = 10000, transf.cofactor = 10000, project.name = "Mydata")

If possible, could you please tell me how to solve this matter?

In addition, in my understanding, this process is conducting arcsish transformation of FCM data. Is it possible to use the scale value data exported by Flowjo for the downstream data analysis using FlowCT? (I suppose merging the data of each sample can be conducted by other tools like "Spectre".)

I would really appreciate it if you could give me some advice. Best regards,

jgarces02 commented 1 month ago

Hi @jfoedfjwofa. Happy to hear you're using FlowCT and found the tutorial useful! 🙂

What kind of files are you trying to read? Normal FCS files? Because this error seems to me to be related with incorrect files.

In addition, regarding your second question. Short answer: I'd use FlowCT from the raw files. Long answer: Which FlowJo transformation did you use? Sometimes it's the same as FlowCT (ie, arcsinh) and it's not a big deal to rerun it again within FlowCT. If you still want to work with the FlowJo normalized data, you'll have to extract the table from the FlowJo's file, and then manually add it to the FlowCT's object.

(Sorry but I didn't get what you mean by merging the data externally, FlowCT does it automatically).

jfoedfjwofa commented 1 month ago

Thank you very much for your prompt reply.

I think I loaded typical FCS files , which were originated from analysis data in BD FACSymphony. I conducted dead cell removal and gating on the population of my interest (CD3+ from heterogenous immune cell population) by Flowjo, and exported the data with "All compensated parameters" of this population as an FCS3 format.

I think we can also export Flowjo data as CSV files (Channel values (pre-transformed) or Scale values). I suppose these CSV files of each sample could be merged into one table by using do.merge.files() function of the "Spectre" pipeline. It is just my guess, but if my FSC files could no longer be loaded into FlowCT, importing such merged CSV matrix is one of the option. I apologize for my lack of explanation.

I would appreciate your advice. Thank you so much.

jgarces02 commented 1 month ago

Could you please send me a couple of these files you're trying to import to FlowCT? Also one of these CSV exported from FlowJo?

Thanks

jfoedfjwofa commented 1 month ago

Thank you very much for your kind reply.

To get to the point, I was successful in creating SCE objects for my sample! Previously I loaded .FCS files containing the data of CD3+-gated population and it failed. This time I tried 2 patterns of .FCS files containing the information of  1) all of the cells (non-gated; original .FCS files)  or  2) CD45+ -gated population, and SCE files could be generated from both of them. (I conducted cell gating by Flowjo, FSC/SSC→ dead cell removal → CD45+-gated → CD3+.)

I still don't know what caused the error last time, but was able to try a basic series of analyses using FlowCT this time. In this process, I encountered several problems as following;

1)  According to the tutorial, in generating SCE objects, we should set the value of "transf.cofacter". I suppose it depends on the FCM machine/software,  but is there a precise way to determine it?

2)  When I runned  "qc.and.removeDoublets" command, the following message appeared. Are there any ways to avoid this ?

Error: vector memory exhausted (limit reached?) In addition: Warning messages: 1: The TIMESTEP keyword was not found and hence it was set to 0.01. Graphs labels indicating time might not be correct

※Sys.setenv('R_MAX_VSIZE'=32000000000) did not fix this error.

3)   I could obtain the UMAP plots with each FlowSOM cluster color-coded, but could not generate the plots for each marker (like figure C in section 3 and 5 of the tutorial).  According to the tutorial, "color.by is not expecified, all markers will be shown (faceted)". However, when I omitted "color.by" or/and "colors"  from the  "dr.plotting" command, the following error messages appeared.

Error in color.by != "expression" :   comparison (!=) is possible only for atomic and list types

Error in melt.data.table(as.data.table(drmd), measure.vars = no.omit.markers,  :   One or more values in 'measure.vars' is invalid.

Are there any ways to solve this matter?

4)  I could not perform a clustering with the "PARC" algorithm. When I specified "method="PARC" in the "clustering.flow" command, the following error messages appeared.

Error in py_module_import(module, convert = convert) :   ModuleNotFoundError: No module named 'parc'

I installed "reticulate" and exerted the following commands, but it could not solve the problem.

Sys.setenv("RETICULATE_PYTHON" = "path") library(reticulate) reticulate::import("parc")

Is this a problem of my PC environment or something..?

5)  I'm a little confused about specifying the type of assay.i, "normalized" or "transformed". I believe it depends on the clustering method, is that correct?

I'm very sorry to bother you with so many questions... I would really appreciate it if you could give me some advice.

Best regards,

mlorenzm commented 1 month ago

Hello @jfoedfjwofa. Thank you for providing such clear and structured issue messages, I think I speak for everyone involved when I say that these kind of questions benefit everyone and are super useful.

2). Please, try to modify your .Renviron file (it may be located in the installation path of R) yto include this R_MAX_VSIZE=100Gb as your first line.

Regarding 3), I kinda fixed that bug in the development branch of this same GitHub repository (specifically on issue https://github.com/paiva-s-lab/FlowCT/issues/5), but those fixes haven’t reached bioconductor’s version yet. In summary, that error is caused when your marker name have hyphens or other special characters other than underscores (_).

For 4) I personally had some issues when installing parc also, it’s quite normal. @cirobotta has already requested to PARC’s developers on a R implementation of PARC, but we aren’t quite there yet. For me I had to 1) make a conda environment within reticulate (and let it be the only environment, with python3), and 2) manually install parc’s dependencies (on that very same environment), namely igraph, leidenalg, hnswlib, and umap-learn. That’s a reticulate/parc issue more than a FlowCT problem, so see if that works. Try also to use the conda environment you created (reticulate::use_virtualenv(“your_environment_name”) before running that flowCT code line.

I hope I have resolved at least one of your questions. Excuse my poor formatting as I'm on mobile.

jfoedfjwofa commented 4 weeks ago

Dear @mlorenzm,

I apologize for the delayed response and thank you very much for your thoughtful response. I learned a lot from your very detailed explanation. Based on your advice, I would like to go through the analysis again. I am sure I will have many more questions,,,, but I really appreciate your continued support.

Best regards,