Closed AClab-sgarcia closed 1 year ago
Hi,
Thank you for using CellphoneDB.
The example dataset in https://github.com/ventolab/CellphoneDB/blob/master/notebooks/data_tutorial.zip contains ~3k cells, hence it is unlikely that the lack of subsampling is the cause of your statistical analysis run getting stuck on 0%. Please could you confirm that you've installed the package in a clean virtual environment? https://stackoverflow.com/questions/67506630/python-tqdm-progress-bar-stuck-at-0 may also help. Please let us know how you got on.
As an aside, https://github.com/ventolab/CellphoneDB/blob/master/notebooks/T01_Method2_with_subsampling.ipynb provides some details on subsampling via geometric sketching. Note that if you don't provide subsampling_num_cells, 1/3 of the cells will be used. This may be a good starting point.
Best regards,
Robert.
Thanks for the fast and kind reply @datasome
I created a new virtual environment tu run CellPhoneDB
pip freeze
anndata==0.8.0
asttokens==2.2.1
backcall==0.2.0
biopython==1.81
CellphoneDB==4.0.0
certifi==2022.12.7
charset-normalizer==3.1.0
colorama==0.4.6
comm==0.1.3
contourpy==1.0.7
cycler==0.11.0
debugpy==1.6.6
decorator==5.1.1
executing==1.2.0
fbpca==1.0
fonttools==4.39.2
geosketch==1.2
h5py==3.8.0
idna==3.4
importlib-metadata==6.1.0
importlib-resources==5.12.0
ipykernel==6.22.0
ipython==8.11.0
ipywidgets==8.0.6
jedi==0.18.2
joblib==1.2.0
jupyter_client==8.1.0
jupyter_core==5.3.0
jupyterlab-widgets==3.0.7
kiwisolver==1.4.4
ktplotspy==0.1.8
matplotlib==3.7.1
matplotlib-inline==0.1.6
mizani==0.8.1
natsort==8.3.1
nest-asyncio==1.5.6
numpy==1.24.2
numpy-groupies==0.9.20
packaging==23.0
palettable==3.3.0
pandas==1.5.3
parso==0.8.3
patsy==0.5.3
pickleshare==0.7.5
Pillow==9.4.0
platformdirs==3.2.0
plotnine==0.10.1
prompt-toolkit==3.0.38
psutil==5.9.4
pure-eval==0.2.2
Pygments==2.14.0
pyparsing==3.0.9
python-circos==0.3.0
python-dateutil==2.8.2
pytz==2023.2
pywin32==306
pyzmq==25.0.2
requests==2.28.2
scikit-learn==0.24.0
scipy==1.10.1
seaborn==0.12.2
six==1.16.0
stack-data==0.6.2
statsmodels==0.13.5
threadpoolctl==3.1.0
tornado==6.2
tqdm==4.65.0
traitlets==5.9.0
tzdata==2023.2
urllib3==1.26.15
wcwidth==0.2.6
widgetsnbextension==4.0.7
zipp==3.15.0
(CellPhoneDB)
I tried installing ipywidgets as suggested in: https://stackoverflow.com/questions/67506630/python-tqdm-progress-bar-stuck-at-0, but still not working
Saioa
Hi Saioa,
Hmm, I see pandas==1.5.3 above (released on 18 Jan 2023, according to https://pandas.pydata.org/docs/whatsnew/v1.5.3.html, i.e. well before we released CellphoneDB v4.0.0 - on 10 March 2023)?
What I've just now tested is the following: conda create -n cpdb python=3.9 source activate cpdb pip install cellphonedb pip install jupyter Then I run jupyter notebook, tested statistical method and it worked as expected. Would you mind trying the same (i.e. from scratch) and letting us know how you got on? Thanks,
Robert.
Thanks for the kind reply Robert,
I am using your package through reticuate in RStudio. For this, I have created a virtual environment from scratch and installed the package. For this reason, I cannot use Jupiter, maybe this is what prevents me from working correctly? I have used this approach with other packages and I have never had any problems, I don't know if there is any way to solve it.
To be more specific, I used this same approach with the previous version of CellPhoneDB and had no problem using it.
Thanks!
Hi Saioa,
It's hard for me to comment as I'm not familiar with reticulate. It would make sense to ascertain if there was some conflict between reticulate and the python module called tqdm that we used to implement the progress bar during the statistical analysis run. To test this, could I please ask that you run: pip install --force-reinstall "git+https://github.com/ventolab/CellphoneDB.git@reticulate" and in cpdb_statistical_analysis_method.call() provide one additional argument: progress_bar = False and then let us know how you got on?
Best regards, Robert.
Hi Robert,
Thank you very much for your efforts to help me.
I have updated the package as you have told me:
$ pip freeze
anndata==0.8.0
asttokens==2.2.1
backcall==0.2.0
biopython==1.81
cellphonedb @ git+https://github.com/ventolab/CellphoneDB.git@3b306f7e4b369889763a597c85bc1ad7c3b4ecb6
certifi==2022.12.7
charset-normalizer==3.1.0
colorama==0.4.6
comm==0.1.3
contourpy==1.0.7
cycler==0.11.0
debugpy==1.6.6
decorator==5.1.1
executing==1.2.0
fbpca==1.0
fonttools==4.39.3
geosketch==1.2
h5py==3.8.0
idna==3.4
importlib-metadata==6.1.0
importlib-resources==5.12.0
ipykernel==6.22.0
ipython==8.11.0
ipywidgets==8.0.6
jedi==0.18.2
joblib==1.2.0
jupyter_client==8.1.0
jupyter_core==5.3.0
jupyterlab-widgets==3.0.7
kiwisolver==1.4.4
ktplotspy==0.1.9
matplotlib==3.7.1
matplotlib-inline==0.1.6
mizani==0.8.1
natsort==8.3.1
nest-asyncio==1.5.6
numpy==1.24.2
numpy-groupies==0.9.20
packaging==23.0
palettable==3.3.1
pandas==2.0.0
parso==0.8.3
patsy==0.5.3
pickleshare==0.7.5
Pillow==9.5.0
platformdirs==3.2.0
plotnine==0.10.1
prompt-toolkit==3.0.38
psutil==5.9.4
pure-eval==0.2.2
Pygments==2.14.0
pyparsing==3.0.9
python-circos==0.3.0
python-dateutil==2.8.2
pytz==2023.3
pywin32==306
pyzmq==25.0.2
requests==2.28.2
scikit-learn==0.24.0
scipy==1.10.1
seaborn==0.12.2
six==1.16.0
stack-data==0.6.2
statsmodels==0.13.5
threadpoolctl==3.1.0
tornado==6.2
tqdm==4.65.0
traitlets==5.9.0
tzdata==2023.3
urllib3==1.26.15
wcwidth==0.2.6
widgetsnbextension==4.0.7
zipp==3.15.0
(CellPhoneDB)
I have used the notebook example but it still does not work. The progress bar does not appear but the analysis does not finish:
deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
cpdb_file_path = cpdb_file_path,
meta_file_path = meta_file_path,
counts_file_path = counts_file_path,
counts_data = 'hgnc_symbol',
microenvs_file_path = microenvs_file_path,
iterations = 1000,
threshold = 0.1,
threads = 4,
debug_seed = 42,
result_precision = 3,
pvalue = 0.05,
subsampling = True,
subsampling_log = False,
subsampling_num_pc = 100,
subsampling_num_cells = 3312,
separator = '|',
debug = False,
output_path = out_path,
output_suffix = None,
progress_bar = False
)
Thank you very much for your help! Saioa
Hi Saioa, Could you please try and run the package within Jupyter notebook but outside of reticulate, so that we can establish if the issue is somehow to do with your machine or your reticulate? Best wishes, Robert.
Hi Robert,
I have tried to run it in Jupyter notebook and both the original version and the version you have made for reticulate works, so I assume that it is not a problem of my machine, but of the connection with reticulate.
On the other hand, when the "Building results" step begins, with both options (original and reticulate versions). I get the following error:
pip freeze
aiofiles==22.1.0Note: you may need to restart the kernel to use updated packages.
aiosqlite==0.18.0
anndata==0.9.0
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
attrs==22.2.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
biopython==1.81
bleach==6.0.0
CellphoneDB==4.0.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.1.0
colorama==0.4.6
comm==0.1.3
contourpy==1.0.7
cycler==0.11.0
debugpy==1.6.7
decorator==5.1.1
defusedxml==0.7.1
executing==1.2.0
fastjsonschema==2.16.3
fbpca==1.0
fonttools==4.39.3
fqdn==1.5.1
geosketch==1.2
h5py==3.8.0
idna==3.4
importlib-metadata==6.3.0
importlib-resources==5.12.0
ipykernel==6.22.0
ipython==8.12.0
ipython-genutils==0.2.0
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
joblib==1.2.0
json5==0.9.11
jsonpointer==2.3
jsonschema==4.17.3
jupyter-events==0.6.3
jupyter-ydoc==0.2.3
jupyter_client==8.1.0
jupyter_core==5.3.0
jupyter_server==2.5.0
jupyter_server_fileid==0.9.0
jupyter_server_terminals==0.4.4
jupyter_server_ydoc==0.8.0
jupyterlab==3.6.3
jupyterlab-pygments==0.2.2
jupyterlab_server==2.22.0
kiwisolver==1.4.4
ktplotspy==0.1.9
MarkupSafe==2.1.2
matplotlib==3.7.1
matplotlib-inline==0.1.6
mistune==2.0.5
mizani==0.8.1
natsort==8.3.1
nbclassic==0.5.5
nbclient==0.7.3
nbconvert==7.3.1
nbformat==5.8.0
nest-asyncio==1.5.6
notebook==6.5.4
notebook_shim==0.2.2
numpy==1.24.2
numpy-groupies==0.9.20
packaging==23.0
palettable==3.3.1
pandas==1.5.0
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.3
pickleshare==0.7.5
Pillow==9.5.0
platformdirs==3.2.0
plotnine==0.10.1
prometheus-client==0.16.0
prompt-toolkit==3.0.38
psutil==5.9.4
pure-eval==0.2.2
pycparser==2.21
Pygments==2.15.0
pyparsing==3.0.9
pyrsistent==0.19.3
python-circos==0.3.0
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3
pywin32==306
pywinpty==2.0.10
PyYAML==6.0
pyzmq==25.0.2
requests==2.28.2
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
scikit-learn==0.24.0
scipy==1.10.1
seaborn==0.12.2
Send2Trash==1.8.0
six==1.16.0
sniffio==1.3.0
soupsieve==2.4
stack-data==0.6.2
statsmodels==0.13.5
terminado==0.17.1
threadpoolctl==3.1.0
tinycss2==1.2.1
tomli==2.0.1
tornado==6.2
tqdm==4.65.0
traitlets==5.9.0
typing_extensions==4.5.0
tzdata==2023.3
uri-template==1.2.0
urllib3==1.26.15
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.5.1
y-py==0.5.9
ypy-websocket==0.8.2
zipp==3.15.0
import os
import importlib
import warnings
warnings.filterwarnings("ignore")
import glob
import anndata as ad
import pandas as pd
import pickle as pkl
import IPython
import cellphonedb
from cellphonedb.utils import db_utils
# For section 3 & 4
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
from cellphonedb.utils import search_utils
import ktplotspy as kpy
# Get the current working directory
cwd = os.getcwd()
# Print the current working directory
print("Current working directory: {0}".format(cwd))
Current working directory: C:\Users\sgarcia\Documents
os.chdir("".join([cwd, "/data_tutorial/"]))
os.getcwd()
'C:\\Users\\sgarcia\\Documents\\data_tutorial'
# Inspect input files
cpdb_file_path = 'db/cellphonedb.zip'
meta_file_path = 'data/metadata.tsv'
counts_file_path = 'data/normalised_log_counts.h5ad'
microenvs_file_path = 'data/microenvironment.tsv'
out_path = 'method2_with_subsampling'
metadata = pd.read_csv(meta_file_path, sep = '\t')
metadata.head(3)
barcode_sample | cell_type | |
---|---|---|
0 | AGCGATTAGTCTAACC-1_Pla_HDBR10917733 | B_cells |
1 | ATCCGTGAGGCTAGAA-1_Pla_Camb10714918 | B_cells |
2 | AGTAACCCATTAAAGG-1_Pla_HDBR10917733 | B_cells |
import anndata
adata = anndata.read_h5ad(counts_file_path)
adata.shape
list(adata.obs.index).sort() == list(metadata['barcode_sample']).sort()
microenv = pd.read_csv(microenvs_file_path, sep = '\t')
microenv.head(3)
microenv.groupby('microenvironment', group_keys = False)['cell_type'].apply(lambda x : list(x.value_counts().index))
microenvironment
Env1 [PV MMP11, PV MYH11, PV STEAP4, EVT_1, EVT_2, ...
Name: cell_type, dtype: object
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
cpdb_file_path = cpdb_file_path, # mandatory: CellPhoneDB database zip file.
meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label.
counts_file_path = counts_file_path, # mandatory: normalized count matrix.
counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix.
microenvs_file_path = microenvs_file_path, # optional (default: None): defines cells per microenvironment.
iterations = 1000, # denotes the number of shufflings performed in the analysis.
threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis.
threads = 4, # number of threads to use in the analysis.
debug_seed = 42, # debug randome seed. To disable >=0.
result_precision = 3, # Sets the rounding for the mean values in significan_means.
pvalue = 0.05, # P-value threshold to employ for significance.
subsampling = True, # To enable subsampling the data (geometri sketching).
subsampling_log = False, # (mandatory) enable subsampling log1p for non log-transformed data inputs.
subsampling_num_pc = 100, # Number of componets to subsample via geometric skectching (dafault: 100).
subsampling_num_cells = 3312, # Number of cells to subsample (integer) (default: 1/3 of the dataset).
separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
debug = False, # Saves all intermediate tables employed during the analysis in pkl format.
output_path = out_path, # Path to save results.
output_suffix = None, # Replaces the timestamp in the output files by a user defined string in the (default: None).
)
Reading user files...
The following user files were loaded successfully:
data/normalised_log_counts.h5ad
data/metadata.tsv
data/microenvironment.tsv
[ ][CORE][12/04/23-12:34:26][INFO] Subsampling 3312 to 3312
[ ][CORE][12/04/23-12:34:28][INFO] Done subsampling 3312 to 3312
[ ][CORE][12/04/23-12:34:29][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:4 Precision:3
[ ][CORE][12/04/23-12:34:29][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][12/04/23-12:34:29][INFO] Running Real Analysis
[ ][CORE][12/04/23-12:34:29][INFO] Limiting cluster combinations using microenvironments
[ ][CORE][12/04/23-12:34:29][INFO] Running Statistical Analysis
100%|███████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:55<00:00, 17.90it/s]
[ ][CORE][12/04/23-12:35:25][INFO] Building Pvalues result
[ ][CORE][12/04/23-12:35:25][INFO] Building results
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[7], line 3
1 from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
----> 3 deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
4 cpdb_file_path = cpdb_file_path, # mandatory: CellPhoneDB database zip file.
5 meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label.
6 counts_file_path = counts_file_path, # mandatory: normalized count matrix.
7 counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix.
8 microenvs_file_path = microenvs_file_path, # optional (default: None): defines cells per microenvironment.
9 iterations = 1000, # denotes the number of shufflings performed in the analysis.
10 threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis.
11 threads = 4, # number of threads to use in the analysis.
12 debug_seed = 42, # debug randome seed. To disable >=0.
13 result_precision = 3, # Sets the rounding for the mean values in significan_means.
14 pvalue = 0.05, # P-value threshold to employ for significance.
15 subsampling = True, # To enable subsampling the data (geometri sketching).
16 subsampling_log = False, # (mandatory) enable subsampling log1p for non log-transformed data inputs.
17 subsampling_num_pc = 100, # Number of componets to subsample via geometric skectching (dafault: 100).
18 subsampling_num_cells = 3312, # Number of cells to subsample (integer) (default: 1/3 of the dataset).
19 separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
20 debug = False, # Saves all intermediate tables employed during the analysis in pkl format.
21 output_path = out_path, # Path to save results.
22 output_suffix = None, # Replaces the timestamp in the output files by a user defined string in the (default: None).
23 )
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\cellphonedb\src\core\methods\cpdb_statistical_analysis_method.py:132, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, iterations, threshold, threads, debug_seed, result_precision, pvalue, subsampling, subsampling_log, subsampling_num_pc, subsampling_num_cells, separator, debug, output_suffix)
129 significant_means['rank'] = significant_means['rank'].apply(lambda rank: rank if rank != 0 else (1 + max_rank))
130 significant_means.sort_values('rank', inplace=True)
--> 132 file_utils.save_dfs_as_tsv(output_path, output_suffix, "statistical_analysis", \
133 {"deconvoluted" : deconvoluted, \
134 "means" : means, \
135 "pvalues" : pvalues, \
136 "significant_means" : significant_means} )
138 return deconvoluted, means, pvalues, significant_means
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\cellphonedb\utils\file_utils.py:212, in save_dfs_as_tsv(out, suffix, analysis_name, name2df)
210 for name, df in name2df.items():
211 file_path = os.path.join(out, "{}_{}_{}.{}".format(analysis_name, name, suffix, "txt"))
--> 212 df.to_csv(file_path, sep = '\t', index=False)
213 print("Saved {} to {}".format(name, file_path))
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
209 else:
210 kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\generic.py:3721, in NDFrame.to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, lineterminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options)
3710 df = self if isinstance(self, ABCDataFrame) else self.to_frame()
3712 formatter = DataFrameFormatter(
3713 frame=df,
3714 header=header,
(...)
3718 decimal=decimal,
3719 )
-> 3721 return DataFrameRenderer(formatter).to_csv(
3722 path_or_buf,
3723 lineterminator=lineterminator,
3724 sep=sep,
3725 encoding=encoding,
3726 errors=errors,
3727 compression=compression,
3728 quoting=quoting,
3729 columns=columns,
3730 index_label=index_label,
3731 mode=mode,
3732 chunksize=chunksize,
3733 quotechar=quotechar,
3734 date_format=date_format,
3735 doublequote=doublequote,
3736 escapechar=escapechar,
3737 storage_options=storage_options,
3738 )
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
209 else:
210 kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\formats\format.py:1189, in DataFrameRenderer.to_csv(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, lineterminator, chunksize, date_format, doublequote, escapechar, errors, storage_options)
1168 created_buffer = False
1170 csv_formatter = CSVFormatter(
1171 path_or_buf=path_or_buf,
1172 lineterminator=lineterminator,
(...)
1187 formatter=self.fmt,
1188 )
-> 1189 csv_formatter.save()
1191 if created_buffer:
1192 assert isinstance(path_or_buf, StringIO)
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\formats\csvs.py:241, in CSVFormatter.save(self)
237 """
238 Create the writer & save.
239 """
240 # apply compression and byte/text conversion
--> 241 with get_handle(
242 self.filepath_or_buffer,
243 self.mode,
244 encoding=self.encoding,
245 errors=self.errors,
246 compression=self.compression,
247 storage_options=self.storage_options,
248 ) as handles:
249
250 # Note: self.encoding is irrelevant here
251 self.writer = csvlib.writer(
252 handles.handle,
253 lineterminator=self.lineterminator,
(...)
258 quotechar=self.quotechar,
259 )
261 self._save()
File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\common.py:857, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
852 elif isinstance(handle, str):
853 # Check whether the filename is to be opened in binary mode.
854 # Binary mode does not support 'encoding' and 'newline'.
855 if ioargs.encoding and "b" not in ioargs.mode:
856 # Encoding
--> 857 handle = open(
858 handle,
859 ioargs.mode,
860 encoding=ioargs.encoding,
861 errors=errors,
862 newline="",
863 )
864 else:
865 # Binary mode
866 handle = open(handle, ioargs.mode)
OSError: [Errno 22] Invalid argument: 'method2_with_subsampling\\statistical_analysis_deconvoluted_04_12_2023_12:35:26.txt'
Thanks
Hi Saioa,
Thanks for all the info. To summarize, the following is currently the case:
Could I please ask you to perform the following tests:
As you appear to use Windows, In jupyter notebook outside of reticulate: a. using the original version, try to set out_path to either 'C:/Users/sgarcia/Documents/data_tutorial/method2_with_subsampling' or './method2_with_subsampling' and see if that makes a difference (c.f. https://stackoverflow.com/questions/57673922/writing-a-pandas-dataframe-to-csv) b. get the latest reticulate version and then leave out_path as 'method2_with_subsampling' and see if this works
In the notebook run via reticulate, using the original version, try to run the statistical analysis using files in https://github.com/ventolab/CellphoneDB/tree/master/example_data . I just wanted to eliminate the possibility that say available memory is an issue when you use reticulate to run the notebook.
Best wishes,
Robert.
Hi Robert,
Regarding what you asked: 1.a. I used jupyter notebook with:
out_path = './method2_with_subsampling'
and
out_path = 'C:/Users/sgarcia/Documents/data_tutorial/method2_with_subsampling'
and none worked, I keep having same issue:
OSError: [Errno 22] Invalid argument:
1.b. Using it inside reticulate with those changes do not work as the original issue it's not fixed, the statistical analysis it's not completed.
Therefore, and as a summary:
I have tested both with my data and also with your data in the examples.
I have also tested this option with my own data as well as with yours from the example and also by changing the output paths as you suggested in the previous message.
Thank you very much Saioa
Hi Saioa, Thanks for the update - 1b needed to be run in jupyter notebook outside of reticulate but using the latest code from reticulate branch of CellphoneDB package. Apologies - I appreciate this is getting confusing.. Could you please try that? Thanks! Robert.
Hello!
I'm sorry for the confusion. I have tried to use it in Jupyter notebook with CellPhoneDB@reticulate in the following ways:
out_path = 'method2_with_subsampling'
out_path = './method2_with_subsampling'
out_path = 'C:/Users/sgarcia/Documents/data_tutorial/method2_with_subsampling'
and I keep getting the same error as with CellPhoneDB (original): 'OSError: [Errno 22] Invalid argument: 'method2_with_subsampling\statistical_analysis_deconvoluted_04_12_2023_12:35:26.txt'Thanks! Saioa
Hi Saioa, Thanks for the feedback. To progress on running Jupyter notebook with CellPhoneDB@reticulate, I've commented out temporarily the code to save resulting DataFrames to files. Hence when you run the analysis (having pulled the latest CellPhoneDB@reticulate), hopefully the analysis succeeds and you will have the DataFrames available to you in the notebook. Would you mind experimenting with saving the DataFrames using save_dfs_as_tsv function in https://github.com/ventolab/CellphoneDB/blob/reticulate/cellphonedb/utils/file_utils.py as a starting point? Perhaps there's some way of using os.path.join and/or os.path.abspath that works on Windows? It's difficult for me to test this locally as I don't have access to a Windows machine. Good luck and thanks! Robert.
Hi Robert,
In the end I was able to fix it.
The problem was not in the paths, I was able to keep:
out_path = 'method2_with_subsampling'
The error comes from the way of saving the timestamp in the function get_timestampsuffix():
I have changed ("%m%d%Y%Y%H:%M:%S"), to ("%m%d%Y%H%M%S"), and now the files are saved without problem.
https://stackoverflow.com/a/75650000
https://www.pythonpool.com/oserror-errno22-invalid-argument-solved/
However, it still does not work in RStudio with the reticulate library.
Thanks Saioa
Hi Saioa,
That's great - well done for finding the cause! I've just made that fix in our master branch. Would you mind doing pip install --force-reinstall "git+https://github.com/ventolab/CellphoneDB.git" and testing that it works also?
On reticulate, I've just installed R studio, and did the following:
install.packages("reticulate") library(reticulate) use_condaenv(condaenv = 'cpdb_reticulate', required = TRUE) repl_python() where cpdb_reticulate is my clean venv with https://github.com/ventolab/CellphoneDB.git@reticulated installed in it. Then I was able to run both basic and statistical analyses and they run fine (no jupyter notebook involved). Could you please confirm how you run CellphoneDB from reticulate so that I can try and replicate it locally?
Best wishes,
Robert.
Hello Robert,
I have tried to redo the analysis using the correction in the master branch and now it works perfectly fine.
Regarding running it in R, it still doesn't work for me, let me tell you what I have done:
In the cpdb_statistical_analysis_method() function I used progress_bar = False but still does not work, gets stuck in the same point,
Reading user files...
The following user files were loaded successfully:
data/normalised_log_counts.h5ad
data/metadata.tsv
data/microenvironment.tsv
[ ][CORE][14/04/23-09:21:42][INFO] Subsampling 3312 to 3312
[ ][CORE][14/04/23-09:21:44][INFO] Done subsampling 3312 to 3312
[ ][CORE][14/04/23-09:21:45][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:4 Precision:3
[ ][CORE][14/04/23-09:21:45][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][14/04/23-09:21:45][INFO] Running Real Analysis
[ ][CORE][14/04/23-09:21:45][INFO] Limiting cluster combinations using microenvironments
[ ][CORE][14/04/23-09:21:45][INFO] Running Statistical Analysis
Thanks
Hi Saioa, Thanks for the update - I've used RStudio -> Tools -> Global Options -> Python rather than use_condaenv but it's still working for me. I'm now suspecting running python via reticulate on Windows has a problem with the module we use for parallelising our statistical analysis (see: https://github.com/rstudio/reticulate/issues/1353). I've just made a change to https://github.com/ventolab/CellphoneDB.git@reticulate to not use that module when threads is set to 1. Would you mind pulling the latest from the reticulate branch, setting threads=1 and seeing if it works? I've added a basic progress display to ease the pain of waiting for the result in single-threaded mode.
If the above works, it may be that the trade-off for running via reticulate will be having to wait a little longer for the results..
Let me know how you get on.
Best wishes,
Robert.
Hi Robert!
It worked!!
Thank you very much for your help, I really appreciate it.
Saioa
Good morning,
I am trying to run statistical_analysis_method on a large dataset, more than 35k cells, and when running the Statistica Analysis it gets stuck.
[ ][CORE][04/04/23-12:21:27][INFO] Running Statistical Analysis
0%| | 0/1000 [00:00<?, ?it/s]
I have tried it with the example in the notebooks, and same thing happen. Could please somebody give a hint on what is going on?
On the other hand, when dealing with sucha a big amount of cells, what's the best "subsampling_num_cells" to use?
Thanks!