ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
320 stars 51 forks source link

UFuncTypeError: when running cpdb analysis #148

Open kpeeles01 opened 11 months ago

kpeeles01 commented 11 months ago

I am new to Single-Cell Sequencing Analysis and coding, but I am working on analyzing some sequencing data and comparing interactions between cells.

I followed the CellPhone DB tutorials to prepare my data both through Jupyter Notebooks and R-studio, and I am getting this same error with the data set processed and prepared for CellPhoneDB through either program. I am also on Windows. I set up the paths first: image

And I made sure that I can read in my data and that the dimensions match up: image

My error is when I try to run the analysis: image

I receive this error: UFuncTypeError Traceback (most recent call last) Cell In[63], line 8 1 ##### Step 3: Run basic analysis 2 3 # copying code from: https://github.com/ventolab/CellphoneDB/blob/master/notebooks/T01_Method1.ipynb 4 # tutorial for CellPhoneDB 6 from cellphonedb.src.core.methods import cpdb_analysis_method ----> 8 means, deconvoluted = cpdb_analysis_method.call( 9 cpdb_file_path = cpdb_file_path, # mandatory: CellPhoneDB database zip file. 10 meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label. 11 counts_file_path = counts_file_path, # mandatory: normalized count matrix. 12 counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix. 13 output_path = out_path, # Path to save results microenvs_file_path = None, 14 separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB". 15 threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis. 16 result_precision = 3, # Sets the rounding for the mean values in significan_means. 17 debug = True, # Saves all intermediate tables emplyed during the analysis in pkl format. 18 output_suffix = None # Replaces the timestamp in the output files by a user defined string in the (default: None) 19 )

File ~\anaconda3\envs\cpdb\lib\site-packages\cellphonedb\src\core\methods\cpdb_analysis_method.py:116, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, separator, threshold, result_precision, debug, output_suffix) 110 cluster_interactions = cpdb_statistical_analysis_helper.get_cluster_combinations(clusters['names'], microenvs) 112 base_result = cpdb_statistical_analysis_helper.build_result_matrix(interactions_filtered, 113 cluster_interactions, 114 separator) --> 116 mean_analysis = cpdb_statistical_analysis_helper.mean_analysis(interactions_filtered, 117 clusters, 118 cluster_interactions, 119 separator) 121 percent_analysis = cpdb_statistical_analysis_helper.percent_analysis(clusters, 122 threshold, 123 interactions_filtered, 124 cluster_interactions, 125 separator) 127 if debug:

File ~\anaconda3\envs\cpdb\lib\site-packages\cellphonedb\src\core\methods\cpdb_statistical_analysis_helper.py:359, in mean_analysis(interactions, clusters, cluster_combinations, separator) 353 x = clusters['means'].loc[gene1_ids, cluster1_names].values 354 y = clusters['means'].loc[gene2_ids, cluster2_names].values 356 result = pd.DataFrame( 357 (x > 0) (y > 0) (x + y) / 2, 358 index=interactions.index, --> 359 columns=(pd.Series(cluster1_names) + separator + pd.Series(cluster2_names)).values) 361 return result

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\ops\common.py:81, in _unpack_zerodim_and_defer..new_method(self, other) 77 return NotImplemented 79 other = item_from_zerodim(other) ---> 81 return method(self, other)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\arraylike.py:186, in OpsMixin.add(self, other) 98 @unpack_zerodim_and_defer("add") 99 def add(self, other): 100 """ 101 Get Addition of DataFrame and other, column-wise. 102 (...) 184 moose 3.0 NaN 185 """ --> 186 return self._arith_method(other, operator.add)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\series.py:6112, in Series._arith_method(self, other, op) 6110 def _arith_method(self, other, op): 6111 self, other = ops.align_method_SERIES(self, other) -> 6112 return base.IndexOpsMixin._arith_method(self, other, op)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\base.py:1348, in IndexOpsMixin._arith_method(self, other, op) 1345 rvalues = ensure_wrapped_if_datetimelike(rvalues) 1347 with np.errstate(all="ignore"): -> 1348 result = ops.arithmetic_op(lvalues, rvalues, op) 1350 return self._construct_result(result, name=res_name)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\ops\array_ops.py:232, in arithmetic_op(left, right, op) 228 _bool_arith_check(op, left, right) 230 # error: Argument 1 to "_na_arithmetic_op" has incompatible type 231 # "Union[ExtensionArray, ndarray[Any, Any]]"; expected "ndarray[Any, Any]" --> 232 res_values = _na_arithmetic_op(left, right, op) # type: ignore[arg-type] 234 return res_values

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\ops\array_ops.py:171, in _na_arithmetic_op(left, right, op, is_cmp) 168 func = partial(expressions.evaluate, op) 170 try: --> 171 result = func(left, right) 172 except TypeError: 173 if not is_cmp and (is_object_dtype(left.dtype) or is_object_dtype(right)): 174 # For object dtype, fallback to a masked operation (only operating 175 # on the non-missing values) 176 # Don't do this for comparisons, as that will handle complex numbers 177 # incorrectly, see GH#32047

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U1')) -> None

I believe there is an issue with how my data was processed or saved, but I do not know where the issue is or how to fix it.

datasome commented 11 months ago

Hi kpeeles01,

My apologies for not replying earlier. Please use strings rather than numbers to represent cell types in all your input files (meta, counts). Then the above issue should go away. I will amend CellphoneDB documentation to advise users against using numeric cell type identifiers.

Best wishes,

Robert.

kpeeles01 commented 11 months ago

Thank you for your response,

I have tried to convert all of the files to str, but it is still giving me the same error. I did find that if certain files are opened with pandas it will convert numbers to integers instead of leaving them as str. I'm not sure how to work around that.

On Tue, Oct 31, 2023, 6:36 AM datasome @.***> wrote:

Hi kpeeles01,

My apologies for not replying earlier. Please use strings rather than numbers to represent cell types in all your input files (meta, counts). Then the above issue should go away. I will amend CellphoneDB documentation to advise users against using numeric cell type identifiers.

Best wishes,

Robert.

— Reply to this email directly, view it on GitHub https://github.com/ventolab/CellphoneDB/issues/148#issuecomment-1787042590, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDSKGUDL26LKCPQ3CAPVRY3YCDPEPAVCNFSM6AAAAAA6TJRSXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBXGA2DENJZGA . You are receiving this because you authored the thread.Message ID: @.***>

datasome commented 11 months ago

Hi kpeeles01,

In your original meta file (with numeric cell types), if you rename at least one of your cell types to text - e.g. 1 to c1 - it will force pandas to treat that column as text and the code should then work. I have tested a simple test_meta.txt file:

Cell cell_type d-pos_AAACCTGAGCAGGTCA c1 d-pos_AAACCTGGTACCGAGA 0 d-pos_AAACCTGTCGCCATAA c1 d-pos_AAACGGGTCAGTTGAC 2 d-pos_AAAGATGCATTGAGCT 0 d-pos_AAAGATGTCCAAAGTC 0 d-pos_AAAGCAAAGAGGACGG 3 d-pos_AAAGCAACACATTCGA c1 d-pos_AAAGTAGAGAGCCCAA 0 d-pos_AAAGTAGCAAGCTGAG 0

with the following simple python code:

import pandas as pd import numpy as np f=open("test_meta.txt") meta = pd.read_csv(f, sep='\t') CELL_TYPE = 'cell_type' meta[CELL_TYPE] = meta[CELL_TYPE].astype('category') cluster_names = meta[CELL_TYPE].cat.categories cluster_combinations = np.array(np.meshgrid(cluster_names.values, cluster_names.values)).T.reshape(-1, 2) cluster1_names = cluster_combinations[:, 0] cluster2_names = cluster_combinations[:, 1] separator='\t' pd.Series(cluster1_names) + separator + pd.Series(cluster2_names)

Before I changed 1 to c1 in the file I got the same error as you, but after the change above code worked. The above code mimics what happens in CellphoneDB and thus your CellphoneDB analysis should work as well.

Could you give it a go a let me know how you got on?

Best,

Robert.