kharchenkolab / gpsFISH

Optimization of gene panels for targeted spatial transcriptomics
Other
6 stars 1 forks source link

Memory allocation error during simulation_training_ZINB function execution #22

Open newbieMars opened 2 weeks ago

newbieMars commented 2 weeks ago

Hello,

Thank you again for your excellent package! I have encountered an issue when running the simulation_training_ZINB function, and I would like to bring it to your attention.

Context: I am executing the function with the following input dimensions:

The function is called as follow:

simulation.params = simulation_training_ZINB(
    sc_count = sc_count,
    spatial_count = spatial_count,
    overlap_gene = overlap_gene,
    unique_cluster_label = unique_cluster_label,
    sc_cluster = sc_cluster,
    spatial_cluster = spatial_cluster,
    outputpath = outputpath,
    optimizer = "variational_inference",
    mcmc.check = FALSE,
    num.iter = 1000,
    num.chain = 4,
    num.core = 32,
    saveplot = FALSE
)

Output and Error: The function begins running, and the following messages are displayed during execution:

[1] “Prepare data”
[1] “Start model fitting”
Chain 1: EXPERIMENTAL ALGORITHM...
...
Chain 1: Iteration: 1000 -4621088.399 0.002 0.003
Chain 1: Informational Message: The maximum number of iterations is reached! The algorithm may not have converged.
Chain 1: Drawing a sample of size 1000 from the approximate posterior...
Chain 1: COMPLETED.

However, shortly after this, the following error occurs:

##Error in scan(csvfile, what = character(), sep = "\n", comment.char = "", : 
could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'

Issue: It seems that the function is running out of memory when trying to allocate 2048 Mb. I suspect this might be related to the large dataset size or the number of iterations specified. I would appreciate any guidance on whether this is an expected limitation, or if there are steps I could take to resolve this issue (e.g., adjusting parameters or environment settings). I have already tried to launch my script with unlimited memory in a HPC (using a singularity container) without success:

ulimit -s unlimited && Rscript 03_Script/17_gpsFISH_platform_effect_estimation/launch_reports_compilation.R

Could it be linked to temporary file writing by rstan pavkage? By the way, I'm able to execute the tutorial without any error on my laptop (with a docker container), the problem occurs on my data with more iteration and cores.

Environment:

Matrix products: default

Attached packages:

Thank you very much for your help, and please let me know if you need any additional information!

newbieMars commented 1 week ago

While the function simulation_training_ZINB is running, different kind of temporary files are generated:

Among these file, the csv one, after 24 hours of calculation, has a size of 231 GB and continues to grow. Could this be the cause of the error message? Is the size of my space object count the problem? Should I down-sample it ?

YidaZhang0628 commented 1 day ago

Hi,

I apologize for the delay in getting back to you. I have been traveling in the past few weeks.

The size of the data you have is not huge. What seems fishy is the large CSV file. I don't remember seeing large CSV files when running simulation_training_ZINB. There are a few things we can check:

  1. When you run the code in the tutorial, do you see similar CSV files being generated? What is the size of that?
  2. What is the content of the CSV file?
  3. If you reduce the size of the data by subsetting the genes in the scRNA-seq data and cells in the spatial transcriptomic data, do you still have the same problem?