nealpsmith / neals_python_functions

Random useful python functions
MIT License
0 stars 1 forks source link

UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128) #13

Open slowkow opened 3 years ago

slowkow commented 3 years ago

This line might need to be updated for unicode characters in feature names (e.g. 0352 anti-human FcεRIα)

https://github.com/nealpsmith/neals_python_functions/blob/88c8f419a8277ef24308d9c7f47d6c8ba2303f2c/neals_python_functions/analysis/cellbrowser.py#L9

Maybe this will work? I haven't tested it.

expr_mtx["gene"] = [x.encode('utf8') for x in adata.var_names.values]

@RachellyN let me know about this error, here's the log:

Running cbBuild
WARNING:root:The directory CB_CD4_protein_res0.4/browser does not exist. Making a new directory now.
INFO:root:dataRoot is not set in ~/.cellbrowser.conf. Dataset hierarchies are not supported.
INFO:root:Creating CB_CD4_protein_res0.4/browser/cell_browser
INFO:root:Determining if CB_CD4_protein_res0.4/browser/cell_browser/exprMatrix.tsv.gz needs to be created
INFO:root:CB_CD4_protein_res0.4/browser/cell_browser/exprMatrix.tsv.gz does not exist. Must build matrix now.
INFO:root:Creating CB_CD4_protein_res0.4/browser/cell_browser/metaFields
INFO:root:Checking and reordering meta data to CB_CD4_protein_res0.4/browser/cell_browser/meta.tsv
INFO:root:Reading sample names from /projects/covid/rachelly/CD4_Tcells/Subclustering/CITE_integration/CB_CD4_protein_res0.4/meta_data.csv
INFO:root:Reading headers from file /projects/covid/rachelly/CD4_Tcells/Subclustering/CITE_integration/CB_CD4_protein_res0.4/expr_mtx.csv.gz
WARNING:root:1 sample names are in the expression matrix, but not in the meta data. Examples: ['cellName']
WARNING:root:These samples will be removed from the expression matrix. The matrix will need to be filtered.
INFO:root:Data contains 247260 samples/cells
INFO:root:Converting to numbers and compressing meta data fields
INFO:root:Field cellName: type uniqueString, 247260 different values
INFO:root:Field batch: type enum, 94 different values
INFO:root:Field prot_CD45: type float, 73 different values
INFO:root:Field leiden: type int, 23 different values
INFO:root:Field cluster: type enum, 10 different values
INFO:root:Field RNA_leiden_08: type int, 9 different values
INFO:root:Field donor: type enum, 692 different values
INFO:root:Field age_cat: type enum, 5 different values
INFO:root:Field health: type enum, 5 different values
INFO:root:Indexing meta file CB_CD4_protein_res0.4/browser/cell_browser/meta.tsv to CB_CD4_protein_res0.4/browser/cell_browser/meta.index
INFO:root:Kept 247260 cells present in both meta data file and expression matrix
INFO:root:Auto-detecting number type of /projects/covid/rachelly/CD4_Tcells/Subclustering/CITE_integration/CB_CD4_protein_res0.4/expr_mtx.csv.gz
INFO:root:Auto-detect: Numbers in matrix are of type 'float'
INFO:root:Auto-detected gene IDs type: symbols
INFO:root:Copying+reordering+trimming /projects/covid/rachelly/CD4_Tcells/Subclustering/CITE_integration/CB_CD4_protein_res0.4/expr_mtx.csv.gz to CB_CD4_protein_res0.4/browser/cell_browser/exprMatrix.tsv.gz, keeping 247260 columns with sample ID in meta
INFO:root:Auto-detecting number type of /projects/covid/rachelly/CD4_Tcells/Subclustering/CITE_integration/CB_CD4_protein_res0.4/expr_mtx.csv.gz
INFO:root:Auto-detect: Numbers in matrix are of type 'float'
INFO:root:converting CB_CD4_protein_res0.4/browser/cell_browser/exprMatrix.tsv.gz to CB_CD4_protein_res0.4/browser/cell_browser/exprMatrix.bin and writing index to CB_CD4_protein_res0.4/browser/cell_browser/exprMatrix.json, type float
INFO:root:Compressing gene expression vectors...
Traceback (most recent call last):
  File "/home/rnormand/.conda/envs/CellBrowser/bin/cbBuild", line 10, in <module>
    sys.exit(cbBuildCli())
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/cellbrowser/cellbrowser.py", line 4419, in cbBuildCli
    build(confFnames, outDir, port, redo=options.redo)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/cellbrowser/cellbrowser.py", line 4237, in build
    convertDataset(inDir, inConf, outConf, datasetDir, redo)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/cellbrowser/cellbrowser.py", line 3620, in convertDataset
    convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/cellbrowser/cellbrowser.py", line 3040, in convertExprMatrix
    matType = matrixToBin(outMatrixFname, geneToSym, binMat, binMatIndex, discretBinMat, discretMatrixIndex, metaSampleNames, matType=matType)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/cellbrowser/cellbrowser.py", line 1910, in matrixToBin
    exprStr, minVal = exprEncode(geneId, exprArr, matType)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/cellbrowser/cellbrowser.py", line 1828, in exprEncode
    geneStr = geneIdLen+bytes(geneDesc, encoding="ascii")+exprStr
UnicodeEncodeError: 'ascii' codec can't encode characters in position 7-8: ordinal not in range(128)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<string>", line 29, in CellBrowser_wrapper
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/neals_python_functions/analysis/cellbrowser.py", line 231, in make_kamil_browser
    run_cbBuild(browser_filepath, browser_name, run_browser, **kwargs)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/neals_python_functions/analysis/cellbrowser.py", line 294, in run_cbBuild
    run=run_browser)
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/site-packages/neals_python_functions/analysis/cellbrowser.py", line 135, in _make_browser
    completed_process.check_returncode()
  File "/home/rnormand/.conda/envs/CellBrowser/lib/python3.7/subprocess.py", line 444, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['cbBuild', '-o', 'CB_CD4_protein_res0.4/browser']' returned non-zero exit status 1.