maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 40 forks source link

cbBuild v0.2 fails with sample1 dataset #24

Closed pcm32 closed 5 years ago

pcm32 commented 5 years ago

Hi there,

For some reason the datasets I have been using with cellbrowser since a month or so ago stopped working for us with the following error when running cbBuild (using v0.2):

2018-09-17T16:34:56.524288956Z INFO:root:Creating /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample
2018-09-17T16:34:56.525043483Z INFO:root:/export/database/job_working_directory/000/31/working/summary.html does not exist
2018-09-17T16:34:56.525102426Z INFO:root:/export/database/job_working_directory/000/31/working/methods.html does not exist
2018-09-17T16:34:56.525220367Z INFO:root:/export/database/job_working_directory/000/31/working/downloads.html does not exist
2018-09-17T16:34:56.525453936Z INFO:root:/export/database/job_working_directory/000/31/working/thumbnail.png does not exist
2018-09-17T16:34:56.525650515Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T16:34:56.589874633Z INFO:root:Creating /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/metaFields
2018-09-17T16:34:56.589895612Z INFO:root:Checking and reordering meta data to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv
2018-09-17T16:34:56.589899082Z INFO:root:Reading sample names from /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T16:34:56.602534967Z INFO:root:Reading headers of file /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T16:34:56.613669673Z WARNING:root:2902 samples names are in the meta data, but not in the expression matrix. Examples: ['S36.C3', 'S92.D5', 'S2.D10', 'S114.H11', 'S114.B6', 'S131.C5', 'S184.G8', 'S94.F8', 'S32.A6', 'S222.B4']
2018-09-17T16:34:56.614792349Z WARNING:root:These samples will be removed from the meta data
2018-09-17T16:34:56.615269706Z INFO:root:Data contains 4261 samples/cells
2018-09-17T16:34:56.633090318Z INFO:root:Converting to numbers and compressing meta data fields
2018-09-17T16:34:56.633108027Z INFO:root:Meta data field index 0: 'Cell'
2018-09-17T16:34:56.69784884Z INFO:root:Type: uniqueString, 4261 different values
2018-09-17T16:34:56.698013476Z INFO:root:Meta data field index 1: 'WGCNAcluster'
2018-09-17T16:34:56.733551043Z INFO:root:Type: enum, 48 different values
2018-09-17T16:34:56.733598447Z INFO:root:Meta data field index 2: 'Name'
2018-09-17T16:34:56.782910478Z INFO:root:Type: enum, 48 different values
2018-09-17T16:34:56.782929858Z INFO:root:Meta data field index 3: 'Age'
2018-09-17T16:34:56.791846888Z INFO:root:Number of values per decile-bin: [289, 899, 720, 407, 459, 307, 289, 404, 417, 70]
2018-09-17T16:34:56.797273961Z INFO:root:Type: float, 29 different values
2018-09-17T16:34:56.797360832Z INFO:root:Meta data field index 4: 'RegionName'
2018-09-17T16:34:56.813704138Z INFO:root:Type: enum, 4 different values
2018-09-17T16:34:56.813722999Z INFO:root:Meta data field index 5: 'Laminae'
2018-09-17T16:34:56.878486659Z INFO:root:Type: enum, 7 different values
2018-09-17T16:34:56.878546838Z INFO:root:Meta data field index 6: 'Area'
2018-09-17T16:34:56.89320577Z INFO:root:Type: enum, 7 different values
2018-09-17T16:34:56.894613807Z INFO:root:Indexing meta file /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.index
2018-09-17T16:34:56.995743217Z INFO:root:Kept 4261 cells present in both meta data file and expression matrix
2018-09-17T16:34:56.995767735Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv
2018-09-17T16:34:56.999302087Z INFO:root:Determining if /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz needs to be created
2018-09-17T16:34:56.99932641Z INFO:root:/export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz does not exist.
2018-09-17T16:34:56.999344021Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T16:34:57.00930947Z INFO:root:Copying+reordering+trimming /export/galaxy-central/database/files/000/dataset_35.dat to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz, keeping only the 4261 columns with a sample name in the meta data
2018-09-17T16:34:57.011749543Z INFO:root:Auto-detecting number type of /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T16:34:57.013617475Z INFO:root:Numbers in matrix are of type 'float'
2018-09-17T16:34:57.979119576Z INFO:root:converting /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.bin and writing index to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.json
2018-09-17T16:34:57.979141387Z INFO:root:Compressing gene expression vectors...
2018-09-17T16:34:57.983720239Z INFO:root:Auto-detecting number type of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz
2018-09-17T16:34:57.984819571Z INFO:root:Numbers in matrix are of type 'float'
2018-09-17T16:34:58.211085954Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz
2018-09-17T16:34:58.270150678Z INFO:root:Wrote /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/cellbrowser.json.bak
2018-09-17T16:34:58.277730905Z Traceback (most recent call last):
2018-09-17T16:34:58.277756473Z   File "/usr/local/bin/cbBuild", line 10, in <module>
2018-09-17T16:34:58.27775973Z     cellbrowser.convertAndCopyCli()
2018-09-17T16:34:58.277762262Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2246, in convertAndCopyCli
2018-09-17T16:34:58.277764817Z     convertAndCopy(confFnames, outDir, port)
2018-09-17T16:34:58.27776716Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2220, in convertAndCopy
2018-09-17T16:34:58.277769633Z     convertDataset(inConf, outConf, datasetDir)
2018-09-17T16:34:58.277772084Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2062, in convertDataset
2018-09-17T16:34:58.277784502Z     writeConfig(inConf, outConf, datasetDir)
2018-09-17T16:34:58.277787168Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2178, in writeConfig
2018-09-17T16:34:58.277789623Z     json.dump(outConf, descJsonFh, indent=2)
2018-09-17T16:34:58.277791932Z   File "/usr/local/lib/python3.6/json/__init__.py", line 179, in dump
2018-09-17T16:34:58.277794342Z     for chunk in iterable:
2018-09-17T16:34:58.27779669Z   File "/usr/local/lib/python3.6/json/encoder.py", line 430, in _iterencode
2018-09-17T16:34:58.277799165Z     yield from _iterencode_dict(o, _current_indent_level)
2018-09-17T16:34:58.27780191Z   File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
2018-09-17T16:34:58.277804503Z     yield from chunks
2018-09-17T16:34:58.277806857Z   File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
2018-09-17T16:34:58.277809216Z     yield from chunks
2018-09-17T16:34:58.277811512Z   File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
2018-09-17T16:34:58.277813907Z     yield from chunks
2018-09-17T16:34:58.277816211Z   File "/usr/local/lib/python3.6/json/encoder.py", line 437, in _iterencode
2018-09-17T16:34:58.27781863Z     o = _default(o)
2018-09-17T16:34:58.277820957Z   File "/usr/local/lib/python3.6/json/encoder.py", line 180, in default
2018-09-17T16:34:58.277823361Z     o.__class__.__name__)
2018-09-17T16:34:58.277825713Z TypeError: Object of type 'bytes' is not JSON serializable

I made sure all was fine by re-downloading dataset files (exprMatrix.tsv, meta.tsv, tsne.coords.tsv), but still same issue. Thanks!

For our current functionality (we would like to show this working to someone on the 20/09), would it be possible to tag a20c4d0533a623f6d2b2b357ef94d75a2b87c569 as v0.1.9 or something prior to v0.2. I think that that was the last version that worked for us. Then we can use it to build a container from that while issues are sorted.

Thanks! Pablo

maximilianh commented 5 years ago

Oh darn, sorry! Did you try the master branch! I am just using the master branch and that seems to work.

I've just tagged the current master as 0.25, I hope that helps!

On Mon, Sep 17, 2018 at 7:06 PM, Pablo Moreno notifications@github.com wrote:

Hi there,

For some reason the datasets I have been using with cellbrowser since a month or so ago stopped working for us with the following error when running cbBuild (using v0.2):

2018-09-17T16:34:56.524288956Z INFO:root:Creating /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample 2018-09-17T16:34:56.525043483Z INFO:root:/export/database/job_working_directory/000/31/working/summary.html does not exist 2018-09-17T16:34:56.525102426Z INFO:root:/export/database/job_working_directory/000/31/working/methods.html does not exist 2018-09-17T16:34:56.525220367Z INFO:root:/export/database/job_working_directory/000/31/working/downloads.html does not exist 2018-09-17T16:34:56.525453936Z INFO:root:/export/database/job_working_directory/000/31/working/thumbnail.png does not exist 2018-09-17T16:34:56.525650515Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_33.dat 2018-09-17T16:34:56.589874633Z INFO:root:Creating /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/metaFields 2018-09-17T16:34:56.589895612Z INFO:root:Checking and reordering meta data to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv 2018-09-17T16:34:56.589899082Z INFO:root:Reading sample names from /export/galaxy-central/database/files/000/dataset_33.dat 2018-09-17T16:34:56.602534967Z INFO:root:Reading headers of file /export/galaxy-central/database/files/000/dataset_35.dat 2018-09-17T16:34:56.613669673Z WARNING:root:2902 samples names are in the meta data, but not in the expression matrix. Examples: ['S36.C3', 'S92.D5', 'S2.D10', 'S114.H11', 'S114.B6', 'S131.C5', 'S184.G8', 'S94.F8', 'S32.A6', 'S222.B4'] 2018-09-17T16:34:56.614792349Z WARNING:root:These samples will be removed from the meta data 2018-09-17T16:34:56.615269706Z INFO:root:Data contains 4261 samples/cells 2018-09-17T16:34:56.633090318Z INFO:root:Converting to numbers and compressing meta data fields 2018-09-17T16:34:56.633108027Z INFO:root:Meta data field index 0: 'Cell' 2018-09-17T16:34:56.69784884Z INFO:root:Type: uniqueString, 4261 different values 2018-09-17T16:34:56.698013476Z INFO:root:Meta data field index 1: 'WGCNAcluster' 2018-09-17T16:34:56.733551043Z INFO:root:Type: enum, 48 different values 2018-09-17T16:34:56.733598447Z INFO:root:Meta data field index 2: 'Name' 2018-09-17T16:34:56.782910478Z INFO:root:Type: enum, 48 different values 2018-09-17T16:34:56.782929858Z INFO:root:Meta data field index 3: 'Age' 2018-09-17T16:34:56.791846888Z INFO:root:Number of values per decile-bin: [289, 899, 720, 407, 459, 307, 289, 404, 417, 70] 2018-09-17T16:34:56.797273961Z INFO:root:Type: float, 29 different values 2018-09-17T16:34:56.797360832Z INFO:root:Meta data field index 4: 'RegionName' 2018-09-17T16:34:56.813704138Z INFO:root:Type: enum, 4 different values 2018-09-17T16:34:56.813722999Z INFO:root:Meta data field index 5: 'Laminae' 2018-09-17T16:34:56.878486659Z INFO:root:Type: enum, 7 different values 2018-09-17T16:34:56.878546838Z INFO:root:Meta data field index 6: 'Area' 2018-09-17T16:34:56.89320577Z INFO:root:Type: enum, 7 different values 2018-09-17T16:34:56.894613807Z INFO:root:Indexing meta file /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.index 2018-09-17T16:34:56.995743217Z INFO:root:Kept 4261 cells present in both meta data file and expression matrix 2018-09-17T16:34:56.995767735Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/meta.tsv 2018-09-17T16:34:56.999302087Z INFO:root:Determining if /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz needs to be created 2018-09-17T16:34:56.99932641Z INFO:root:/export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz does not exist. 2018-09-17T16:34:56.999344021Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_35.dat 2018-09-17T16:34:57.00930947Z INFO:root:Copying+reordering+trimming /export/galaxy-central/database/files/000/dataset_35.dat to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz, keeping only the 4261 columns with a sample name in the meta data 2018-09-17T16:34:57.011749543Z INFO:root:Auto-detecting number type of /export/galaxy-central/database/files/000/dataset_35.dat 2018-09-17T16:34:57.013617475Z INFO:root:Numbers in matrix are of type 'float' 2018-09-17T16:34:57.979119576Z INFO:root:converting /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.bin and writing index to /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.json 2018-09-17T16:34:57.979141387Z INFO:root:Compressing gene expression vectors... 2018-09-17T16:34:57.983720239Z INFO:root:Auto-detecting number type of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz 2018-09-17T16:34:57.984819571Z INFO:root:Numbers in matrix are of type 'float' 2018-09-17T16:34:58.211085954Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/exprMatrix.tsv.gz 2018-09-17T16:34:58.270150678Z INFO:root:Wrote /export/galaxy-central/database/job_working_directory/000/31/dataset_36_files/sample/cellbrowser.json.bak 2018-09-17T16:34:58.277730905Z Traceback (most recent call last): 2018-09-17T16:34:58.277756473Z File "/usr/local/bin/cbBuild", line 10, in 2018-09-17T16:34:58.27775973Z cellbrowser.convertAndCopyCli() 2018-09-17T16:34:58.277762262Z File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2246, in convertAndCopyCli 2018-09-17T16:34:58.277764817Z convertAndCopy(confFnames, outDir, port) 2018-09-17T16:34:58.27776716Z File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2220, in convertAndCopy 2018-09-17T16:34:58.277769633Z convertDataset(inConf, outConf, datasetDir) 2018-09-17T16:34:58.277772084Z File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2062, in convertDataset 2018-09-17T16:34:58.277784502Z writeConfig(inConf, outConf, datasetDir) 2018-09-17T16:34:58.277787168Z File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2178, in writeConfig 2018-09-17T16:34:58.277789623Z json.dump(outConf, descJsonFh, indent=2) 2018-09-17T16:34:58.277791932Z File "/usr/local/lib/python3.6/json/init.py", line 179, in dump 2018-09-17T16:34:58.277794342Z for chunk in iterable: 2018-09-17T16:34:58.27779669Z File "/usr/local/lib/python3.6/json/encoder.py", line 430, in _iterencode 2018-09-17T16:34:58.277799165Z yield from _iterencode_dict(o, _current_indent_level) 2018-09-17T16:34:58.27780191Z File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict 2018-09-17T16:34:58.277804503Z yield from chunks 2018-09-17T16:34:58.277806857Z File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict 2018-09-17T16:34:58.277809216Z yield from chunks 2018-09-17T16:34:58.277811512Z File "/usr/local/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict 2018-09-17T16:34:58.277813907Z yield from chunks 2018-09-17T16:34:58.277816211Z File "/usr/local/lib/python3.6/json/encoder.py", line 437, in _iterencode 2018-09-17T16:34:58.27781863Z o = _default(o) 2018-09-17T16:34:58.277820957Z File "/usr/local/lib/python3.6/json/encoder.py", line 180, in default 2018-09-17T16:34:58.277823361Z o.class.name) 2018-09-17T16:34:58.277825713Z TypeError: Object of type 'bytes' is not JSON serializable

I made sure all was fine by re-downloading dataset files (exprMatrix.tsv, meta.tsv, tsne.coords.tsv), but still same issue. Thanks!

For our current functionality (we would like to show this working to someone on the 20/09), would it be possible to tag a20c4d0533a623f6d2b2b357ef94d75a2b87c569 as v0.1.9 or something prior to v0.2. I think that that was the last version that worked for us. Then we can use it to build a container from that while issues are sorted.

Thanks! Pablo

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TXi0WyfzcoKgYaNWAqkqEoMoNmHIks5ub9aBgaJpZM4WsVj- .

pcm32 commented 5 years ago

Thanks, 0.25 fixes this. I get a different error though now. Will open a new issue.