maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 40 forks source link

Concatenation of int with str throws error for gene symbols and ids #177

Closed pcm32 closed 4 years ago

pcm32 commented 4 years ago

Apparently I have run into a case where a gene symbol or id is being intepreted as a numpy int, so I guess we need to be more defensive on that concatenation line:

INFO:root:Writing gene-by-gene, without using pandas
Traceback (most recent call last):
  File "/Users/pmoreno/miniconda3/envs/__ucsc-cell-browser@0.7.10/bin/cbImportScanpy", line 10, in <module>
    sys.exit(cbImportScanpyCli())
  File "/Users/pmoreno/miniconda3/envs/__ucsc-cell-browser@0.7.10/lib/python3.8/site-packages/cellbrowser/convert.py", line 565, in cbImportScanpyCli
    scanpyToCellbrowser(ad, outDir, datasetName, skipMatrix=options.skipMatrix, useRaw=(not options.useProc),
  File "/Users/pmoreno/miniconda3/envs/__ucsc-cell-browser@0.7.10/lib/python3.8/site-packages/cellbrowser/cellbrowser.py", line 3869, in scanpyToCellbrowser
    anndataMatrixToTsv(adata, matFname, useRaw=useRaw)
  File "/Users/pmoreno/miniconda3/envs/__ucsc-cell-browser@0.7.10/lib/python3.8/site-packages/cellbrowser/cellbrowser.py", line 3778, in anndataMatrixToTsv
    genes = [x+"|"+y for (x,y) in geneIdAndSyms]
  File "/Users/pmoreno/miniconda3/envs/__ucsc-cell-browser@0.7.10/lib/python3.8/site-packages/cellbrowser/cellbrowser.py", line 3778, in <listcomp>
    genes = [x+"|"+y for (x,y) in geneIdAndSyms]
TypeError: can only concatenate str (not "numpy.int16") to str

I have seen this in 0.7.9 as well.

Thanks!

maximilianh commented 4 years ago

Thanks! Fixed on the develop branch. I knew about this, but didn't apply it to the symbol case.

Note, this should not happen on real datasets. If you define genes to be symbols, they can not be numbers. If they are numbers, then they are entrez gene IDs. But I've changed it anyways, just to avoid the crash.