maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 40 forks source link

cbBuild 0.25 fails with sample 1 data #25

Closed pcm32 closed 5 years ago

pcm32 commented 5 years ago

Execution as described in #24 fails with the following error:

2018-09-17T20:44:07.682197743Z INFO:root:/export/database/job_working_directory/000/32/working/summary.html does not exist
2018-09-17T20:44:07.682315009Z INFO:root:/export/database/job_working_directory/000/32/working/methods.html does not exist
2018-09-17T20:44:07.682319046Z INFO:root:/export/database/job_working_directory/000/32/working/downloads.html does not exist
2018-09-17T20:44:07.68232157Z INFO:root:/export/database/job_working_directory/000/32/working/thumb.png does not exist
2018-09-17T20:44:07.682325747Z INFO:root:Getting md5 of /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T20:44:07.686529535Z INFO:root:Checking and reordering meta data to /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.tsv
2018-09-17T20:44:07.686593111Z INFO:root:Reading sample names from /export/galaxy-central/database/files/000/dataset_33.dat
2018-09-17T20:44:07.695780892Z INFO:root:Reading headers of file /export/galaxy-central/database/files/000/dataset_35.dat
2018-09-17T20:44:07.699772794Z WARNING:root:2902 samples names are in the meta data, but not in the expression matrix. Examples: ['S49.G5', 'S35.H2', 'S23.C10', 'S49.F10', 'S27.E6', 'S162.C2', 'S21.H2', 'S44.C10', 'S57.H3', 'S49.D11']
2018-09-17T20:44:07.699798257Z WARNING:root:These samples will be removed from the meta data
2018-09-17T20:44:07.700231588Z INFO:root:Data contains 4261 samples/cells
2018-09-17T20:44:07.758384687Z INFO:root:Converting to numbers and compressing meta data fields
2018-09-17T20:44:07.765759419Z INFO:root:Meta data field index 0: 'Cell'
2018-09-17T20:44:07.788543139Z INFO:root:Type: uniqueString, 4261 different values
2018-09-17T20:44:07.788562019Z INFO:root:Meta data field index 1: 'WGCNAcluster'
2018-09-17T20:44:07.804685374Z INFO:root:Type: enum, 48 different values
2018-09-17T20:44:07.804702719Z INFO:root:Meta data field index 2: 'Name'
2018-09-17T20:44:07.87122994Z INFO:root:Type: enum, 48 different values
2018-09-17T20:44:07.871249785Z INFO:root:Meta data field index 3: 'Age'
2018-09-17T20:44:07.87935809Z INFO:root:Number of values per decile-bin: [289, 899, 720, 407, 459, 307, 289, 404, 417, 70]
2018-09-17T20:44:07.884437089Z INFO:root:Type: float, 29 different values
2018-09-17T20:44:07.884466143Z INFO:root:Meta data field index 4: 'RegionName'
2018-09-17T20:44:07.900016901Z INFO:root:Type: enum, 4 different values
2018-09-17T20:44:07.900041781Z INFO:root:Meta data field index 5: 'Laminae'
2018-09-17T20:44:07.967187537Z INFO:root:Type: enum, 7 different values
2018-09-17T20:44:07.967389047Z INFO:root:Meta data field index 6: 'Area'
2018-09-17T20:44:07.982237946Z INFO:root:Type: enum, 7 different values
2018-09-17T20:44:07.98376565Z INFO:root:Indexing meta file /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.tsv to /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.index
2018-09-17T20:44:08.084167907Z INFO:root:Kept 4261 cells present in both meta data file and expression matrix
2018-09-17T20:44:08.084201346Z INFO:root:Getting md5 of /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/meta.tsv
2018-09-17T20:44:08.08741232Z INFO:root:Determining if /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/exprMatrix.tsv.gz needs to be created
2018-09-17T20:44:08.094838318Z INFO:root:Reading headers of file /export/galaxy-central/database/job_working_directory/000/32/dataset_37_files/sample/exprMatrix.tsv.gz
2018-09-17T20:44:08.097394389Z INFO:root:current input matrix looks identical to previously processed matrix, same file size, same sample names
2018-09-17T20:44:08.097417985Z INFO:root:Matrix and meta sample names have not changed, not indexing matrix again
2018-09-17T20:44:08.097421065Z INFO:root:Parsing coordinates from /export/galaxy-central/database/files/000/dataset_35.dat. FlipY=False, useTwoBytes=False
2018-09-17T20:44:08.385412545Z Traceback (most recent call last):
2018-09-17T20:44:08.38544231Z   File "/usr/local/bin/cbBuild", line 10, in <module>
2018-09-17T20:44:08.385447578Z     cellbrowser.convertAndCopyCli()
2018-09-17T20:44:08.385451124Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2341, in convertAndCopyCli
2018-09-17T20:44:08.385454678Z     convertAndCopy(confFnames, outDir, port)
2018-09-17T20:44:08.385457992Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2315, in convertAndCopy
2018-09-17T20:44:08.385461543Z     convertDataset(inConf, outConf, datasetDir)
2018-09-17T20:44:08.385464798Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2090, in convertDataset
2018-09-17T20:44:08.385468118Z     convertCoords(inConf, outConf, sampleNames, outMeta, datasetDir)
2018-09-17T20:44:08.385471601Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1828, in convertCoords
2018-09-17T20:44:08.385474808Z     coords = parseScaleCoordsAsDict(coordFname, useTwoBytes, flipY)
2018-09-17T20:44:08.385478054Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1216, in parseScaleCoordsAsDict
2018-09-17T20:44:08.385481712Z     for row in lineFileNextRow(fname):
2018-09-17T20:44:08.385484934Z   File "/usr/local/bin/cbPyLib/cellbrowser.py", line 181, in lineFileNextRow
2018-09-17T20:44:08.385488389Z     Record = namedtuple('tsvRec', headers)
2018-09-17T20:44:08.385491697Z   File "/usr/local/lib/python3.6/collections/__init__.py", line 429, in namedtuple
2018-09-17T20:44:08.385496089Z     exec(class_definition, namespace)
2018-09-17T20:44:08.38549937Z   File "<string>", line 12
2018-09-17T20:44:08.385502941Z SyntaxError: more than 255 arguments
pcm32 commented 5 years ago

To follow up on this @maximilianh I forked and tagged the last commit that was working fine (a20c4d0533a623f6d2b2b357ef94d75a2b87c569) with 0.1.9, however I still get other errors... the only difference between this and the previous time that the container was built (unfortunately that container was lost due to some changes in our CI) with that version seem to be the version of some dependencies pulled by conda. Before it was numpy 1.14.3 and now 1.15.x. I pinned down to 1.14.3, but errors I get with 0.1.9 are:

INFO:root:Copying+reordering+trimming /export/galaxy-central/database/files/000/dataset_42.dat to /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.tsv, keeping only the 4261 columns with a sample name in the meta data
INFO:root:converting /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.tsv.gz to /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.bin and writing index to /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.json
INFO:root:Compressing gene expression vectors...
INFO:root:Auto-detecting number type of /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.tsv.gz
INFO:root:Numbers in matrix are of type 'float'
Traceback (most recent call last):
  File "/usr/local/bin/cbBuild", line 10, in <module>
    cellbrowser.convertAndCopyCli()
  File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2145, in convertAndCopyCli
    convertAndCopy(confFnames, outDir, port)
  File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2119, in convertAndCopy
    convertDataset(inConf, outConf, datasetDir)
  File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1962, in convertDataset
    convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix)
  File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1696, in convertExprMatrix
    matType = matrixToBin(outMatrixFname, geneToSym, binMat, binMatIndex, discretBinMat, discretMatrixIndex)
  File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1035, in matrixToBin
    matType, sampleNames = matReader.open(fname)
TypeError: 'NoneType' object is not iterable

Any ideas? I'm happy to move to something above 0.25 again, but currently 0.25's cbBuild is not working either (see first comment above) :-(. Thanks!

maximilianh commented 5 years ago

I can fix this, I can fix this, just give me a day or so...

On Tue, Sep 18, 2018 at 11:36 AM, Pablo Moreno notifications@github.com wrote:

To follow up on this @maximilianh https://github.com/maximilianh I forked and tagged the last commit that was working fine (a20c4d0 https://github.com/maximilianh/cellBrowser/commit/a20c4d0533a623f6d2b2b357ef94d75a2b87c569) with 0.1.9, however I still get other errors... the only difference between this and the previous time that the container was built (unfortunately that container was lost due to some changes in our CI) with that version seem to be the version of some dependencies pulled by conda. Before it was numpy 1.14.3 and now 1.15.x. I pinned down to 1.14.3, but errors I get with 0.1.9 are:

INFO:root:Copying+reordering+trimming /export/galaxy-central/database/files/000/dataset_42.dat to /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.tsv, keeping only the 4261 columns with a sample name in the meta data INFO:root:converting /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.tsv.gz to /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.bin and writing index to /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.json INFO:root:Compressing gene expression vectors... INFO:root:Auto-detecting number type of /export/galaxy-central/database/job_working_directory/000/40/dataset_45_files/sample/exprMatrix.tsv.gz INFO:root:Numbers in matrix are of type 'float' Traceback (most recent call last): File "/usr/local/bin/cbBuild", line 10, in cellbrowser.convertAndCopyCli() File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2145, in convertAndCopyCli convertAndCopy(confFnames, outDir, port) File "/usr/local/bin/cbPyLib/cellbrowser.py", line 2119, in convertAndCopy convertDataset(inConf, outConf, datasetDir) File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1962, in convertDataset convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix) File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1696, in convertExprMatrix matType = matrixToBin(outMatrixFname, geneToSym, binMat, binMatIndex, discretBinMat, discretMatrixIndex) File "/usr/local/bin/cbPyLib/cellbrowser.py", line 1035, in matrixToBin matType, sampleNames = matReader.open(fname) TypeError: 'NoneType' object is not iterable

Any ideas? I'm happy to move to something above 0.25 again, but currently 0.25's cbBuild is not working either (see first comment above) :-(. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/25#issuecomment-422325679, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TXXDpkWypxUCEnPa0kYxdQpXaws2ks5ucL6wgaJpZM4WstOW .

maximilianh commented 5 years ago

Does this really fail on the sample file?

What is dataset_35 ?

2018-09-17T20:44:08.097421065Z INFO:root:Parsing coordinates from /export/galaxy-central/database/files/000/dataset_35.dat. FlipY=False, useTwoBytes=False

Can you send me the full input that you're running on?

The error means that dataset_35 has too many columns. It should only have three. I can add a better error message, but I need to reproduce it myself (also to understand what the misunderstanding is, so I can properly document it and have better error messages).

pcm32 commented 5 years ago

Sorry, just notice that when I tried 0.25 I mistakenly linked the matrix in two different slots... let me retry, it might be that 0.25 is ok!

pcm32 commented 5 years ago

Ok, 0.25 works, this was a late night working error, apologies for the extra issue! Thanks!

maximilianh commented 5 years ago

Phew! Thanks!

I imagine you specified the expression matrix where it wanted the coordinates. Good to know. Maybe I should add a more obvious error message here.

On Tue, Sep 18, 2018 at 6:14 AM, Pablo Moreno notifications@github.com wrote:

Ok, 0.25 works, this was a late night working error, apologies for the extra issue! Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/25#issuecomment-422336657, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TTOHBnB8eyguBYeUGeTDuZ7egYJ4ks5ucMd7gaJpZM4WstOW .

maximilianh commented 5 years ago

Added:

if len(headers)>=255:

    errAbort("Cannot read more than 255 columns. Are you sure that this

file is in the correct format?"

            " It may have the wrong line endings and may require

treatment with dos2unix or mac2unix. "

            " Or it may be the wrong file type for this input, e.g. an

expression matrix instead of a "

            " coordinate file.")

On Tue, Sep 18, 2018 at 6:18 AM, Pablo Moreno notifications@github.com wrote:

Closed #25 https://github.com/maximilianh/cellBrowser/issues/25.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/25#event-1851972047, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS-TdCkKgk-hGsfXuq80vmnslIEPWL7ks5ucMh4gaJpZM4WstOW .

pcm32 commented 5 years ago

Sounds good. There are other things that we could improve for the Galaxy spin up, but will come back to that later. Thanks for your prompt help on all of these.