Open RoganGrant opened 1 year ago
Realizing now that the --forceMtx
flag does not take a text argument, and rather is true if specified, false if not. In any case, it would be great to have an equivalent --forceTSV
flag
Hi Rogan, thanks for your feedback. This is intentional, but I'm interested in your feedback, my knowledge of R (as you can see from my R code) is somewhat limited.
Apparently you have a very big matrix, right?
The problem with big matrices in R is that if they exceed the maximum size of elements 2^31-1 then R stops when the sparse matrix is converted to a normal matrix. Most of the elements are zero, so as long as the non-zero-count of the Matrix is low enough, big Matrices work as long as they're sparse and kept sparse and for that, writing as MTX is (as far as I know - please correct me here, see below) is required. I wrote writeSparseTsvChunks() to be able to write very big matrices, but then R can't read them, so I moved to .mtx.gz everywhere now. This was the idea behind the move towards .mtx.gz files everywhere, both for h5ad and Seurat objects, to make sure that R can always read the matrices.
The MTX format seemed clean enough and easy to read. (BTW: what don't you like about .mtx.gz ?)
cbImportSeurat produces an .Rscript file that you can edit and run manually, you could try it now to force the .tsv.gz file and try to read the result in R - does that work for you? Or do you not care if others can read the resulting .tsv.gz files with R?
Or maybe I'm missing something and it's not too hard to read gigantic .tsv.gz files and there is some trick in R to convert them into sparse matrices in pieces when reading them?
On Wed, Feb 8, 2023 at 7:21 AM Rogan Grant @.***> wrote:
Realizing now that the --forceMtx flag does not take a text argument, and rather is true if specified, false if not. In any case, it would be great to have an equivalent --forceTSV flag
— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/262#issuecomment-1422077082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TJQBXT5DPWFT3KKQZTWWM3NFANCNFSM6AAAAAAUU2JKLQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
And, just in case: I'm not opposed to changing this, I just would like to understand if it makes sense in some context for R to write matrices that it can write, but not read.
On Wed, Feb 8, 2023 at 11:32 AM Maximilian Haeussler @.***> wrote:
Hi Rogan, thanks for your feedback. This is intentional, but I'm interested in your feedback, my knowledge of R (as you can see from my R code) is somewhat limited.
Apparently you have a very big matrix, right?
The problem with big matrices in R is that if they exceed the maximum size of elements 2^31-1 then R stops when the sparse matrix is converted to a normal matrix. Most of the elements are zero, so as long as the non-zero-count of the Matrix is low enough, big Matrices work as long as they're sparse and kept sparse and for that, writing as MTX is (as far as I know - please correct me here, see below) is required. I wrote writeSparseTsvChunks() to be able to write very big matrices, but then R can't read them, so I moved to .mtx.gz everywhere now. This was the idea behind the move towards .mtx.gz files everywhere, both for h5ad and Seurat objects, to make sure that R can always read the matrices.
The MTX format seemed clean enough and easy to read. (BTW: what don't you like about .mtx.gz ?)
cbImportSeurat produces an .Rscript file that you can edit and run manually, you could try it now to force the .tsv.gz file and try to read the result in R - does that work for you? Or do you not care if others can read the resulting .tsv.gz files with R?
Or maybe I'm missing something and it's not too hard to read gigantic .tsv.gz files and there is some trick in R to convert them into sparse matrices in pieces when reading them?
On Wed, Feb 8, 2023 at 7:21 AM Rogan Grant @.***> wrote:
Realizing now that the --forceMtx flag does not take a text argument, and rather is true if specified, false if not. In any case, it would be great to have an equivalent --forceTSV flag
— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/262#issuecomment-1422077082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TJQBXT5DPWFT3KKQZTWWM3NFANCNFSM6AAAAAAUU2JKLQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for the quick response! I have personally converted this matrix to non-sparse in R in the course of certain function calls without issue, but the documentation agrees with you. I honestly don't know how much of a risk this poses in terms of the function failing for others.
In any case I have no issue with .mtx files, but I can't get them to work at all with cbBuild. The cellbrowser.conf file still points to a single tsv file that does not exist, and manually supplying each individual file does not seem to work (next it asks for a barcodes.tsv, which is ignored if I specify directly for each assay). My ultimate solution (which worked very well) was to run mtx2tsv on each assay before deployment.
Hi Rogan,
hmm... I have a few questions sorry:
The cellbrowser.conf file still points to a single tsv file that does not exist, Sorry, I don't understand: do you mean that the auto-generated cellbrowser.conf file does not point to the .mtx.gz file? That's probably a bug. How about changing that filename manually in cellbrowser.conf, doesn't that work?
and manually supplying each individual file does not seem to work Sorry I don't know what you mean... it's sufficient to provide the matrix.mtx.gz file, cbBuild will find the other files.
(next it asks for a barcodes.tsv, which is ignored if I specify directly for each assay) Sorry, I don't understand this sentence.
Message ID: @.***>
Sorry, I should have waited to give more concrete examples. My object has three assays (counts, data, and scale). As far as I can tell cbBuild does not handle this correctly if a .mtx file is used. If I run cbBuild without any conversion, I initially get the following error:
FileNotFoundError: [Errno 2] No such file or directory: '[path]/counts_exprMatrix.tsv.gz'
Full trace:
INFO:root:dataRoot is not set in ~/.cellbrowser.conf or via $CBDATAROOT. Dataset hierarchies are not supported. INFO:root:Creating [path] INFO:root:Determining if [path]/exprMatrix.tsv.gz needs to be created INFO:root:[path]/exprMatrix.tsv.gz does not exist. Must build matrix now. INFO:root:Creating [path]/metaFields INFO:root:Checking and reordering meta data to [path]/meta.tsv INFO:root:Reading sample names from [path]/meta.tsv INFO:root:Reading headers from file [path]/counts_exprMatrix.tsv.gz ERROR:root:Unexpected error: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7f2afdea0f48>) Traceback (most recent call last): File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4783, in cbBuildCli build(confFnames, outDir, port, redo=options.redo) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4598, in build convertDataset(inDir, inConf, outConf, datasetDir, redo) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3944, in convertDataset sampleNames, needFilterMatrix = convertMeta(inDir, inConf, outConf, datasetDir, outMetaFname) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3539, in convertMeta sampleNames, needFilterMatrix = metaReorder(matrixFname, metaFname, finalMetaFname) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2296, in metaReorder matrixSampleNames = readMatrixSampleNames(matrixFname) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2288, in readMatrixSampleNames return readHeaders(fname)[1:] File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3135, in readHeaders ifh = openFile(fname, "rtU") File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 807, in openFile fh = gzip.open(fname, mode, encoding=encoding) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 53, in open binary_file = GzipFile(filename, gz_mode, compresslevel) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 163, in init fileobj = self.myfileobj = builtins.open(filename, mode or 'rb') FileNotFoundError: [Errno 2] No such file or directory: '[path]/counts_exprMatrix.tsv.gz'
The initial cellbrowser.conf file is as follows:
# This is a bare-bones cellbrowser config file auto-generated by the command-line tool cbImportSeurat
# or directly from R with SeuratWrappers::ExportToCellbrowser().
# Look at https://github.com/maximilianh/cellBrowser/blob/master/src/cbPyLib/cellbrowser/sampleConfig/cellbrowser.conf
# for a full file that shows all possible options
name="name"
shortLabel="name"
exprMatrix="counts_exprMatrix.tsv.gz"
matrices=[ {'label':'counts','fileName':'counts_exprMatrix.tsv.gz'},
{'label':'data','fileName':'data_exprMatrix.tsv.gz'},
{'label':'scale','fileName':'scale_exprMatrix.tsv.gz'}]
#tags = ["10x", "smartseq2"]
meta="meta.tsv"
# possible values: "gencode-human", "gencode-mouse", "symbol" or "auto"
geneIdType="auto"
# file with gene,description (one per line) with highlighted genes, called "Dataset Genes" in the user interface
# quickGenesFile="quickGenes.csv"
clusterField="typestate"
labelField="typestate"
enumFields=["orig.ident", "HTO_maxID", "HTO_secondID", "HTO_classification", "HTO_classification.global", "hash.ID", "MULTI_ID", "MULTI_classification"$
markers = [{"file": "markers.tsv", "shortLabel": "Seurat Cluster Markers"}]
coords=[{"file": "umap.coords.tsv", "shortLabel": "Seurat umap"},
{"file": "SCVI.coords.tsv", "shortLabel": "Seurat SCVI"}]
If I modify the cellbrowser.conf matrices
and exprMatrix
arguments as follows (note that scale is a smaller matrix, still gets output as a tsv):
exprMatrix="counts_matrix.mtx.gz"
matrices=[ {'label':'counts','fileName':'counts_matrix.mtx.gz'},
{'label':'data','fileName':'data_matrix.mtx.gz'},
{'label':'scale','fileName':'scale_exprMatrix.tsv.gz'}]
I run into a new error, where it seems cbBuild does not recognize the additional assays:
FileNotFoundError: [Errno 2] No such file or directory: '[path]/barcodes.tsv.gz'
Full trace:
INFO:root:dataRoot is not set in ~/.cellbrowser.conf or via $CBDATAROOT. Dataset hierarchies are not supported. INFO:root:Determining if /var/www/apps/test/name/matrix.mtx.gz needs to be created INFO:root:/var/www/apps/test/name/matrix.mtx.gz does not exist. Must build matrix now. INFO:root:Checking and reordering meta data to /var/www/apps/test/name/meta.tsv INFO:root:Reading sample names from [path]/meta.tsv INFO:root:Reading sample names for [path] -> [path]/barcodes.tsv.gz ERROR:root:Unexpected error: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7fdd7ffa56c8>) Traceback (most recent call last): File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4783, in cbBuildCli build(confFnames, outDir, port, redo=options.redo) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4598, in build convertDataset(inDir, inConf, outConf, datasetDir, redo) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3944, in convertDataset sampleNames, needFilterMatrix = convertMeta(inDir, inConf, outConf, datasetDir, outMetaFname) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3539, in convertMeta sampleNames, needFilterMatrix = metaReorder(matrixFname, metaFname, finalMetaFname) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2296, in metaReorder matrixSampleNames = readMatrixSampleNames(matrixFname) File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2281, in readMatrixSampleNames lines = openFile(barcodePath).read().splitlines() File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 807, in openFile fh = gzip.open(fname, mode, encoding=encoding) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 53, in open binary_file = GzipFile(filename, gz_mode, compresslevel) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 163, in init fileobj = self.myfileobj = builtins.open(filename, mode or 'rb') FileNotFoundError: [Errno 2] No such file or directory: '[path]/barcodes.tsv.gz'
Finally, if I add additional fields to specify the naming structure, it seems to be ignored (but perhaps I am using the wrong arguments):
exprMatrix="counts_matrix.mtx.gz"
matrices=[ {'label':'counts','fileName':'counts_matrix.mtx.gz'},
{'label':'data','fileName':'data_matrix.mtx.gz'},
{'label':'scale','fileName':'scale_exprMatrix.tsv.gz'}]
barcodes=[ {'label':'counts','fileName':'counts_barcodes.tsv.gz'},
{'label':'data','fileName':'data_barcodes.tsv.gz'}]
features=[ {'label':'counts','fileName':'counts_features.tsv.gz'},
{'label':'data','fileName':'data_features.tsv.gz'}]
Same error:
FileNotFoundError: [Errno 2] No such file or directory: '[path]/barcodes.tsv.gz'
Thank you for your help with this
First of all, thank you for this incredibly useful package. We get a lot of use out of it.
For large matrices (where
too.big = TRUE
), I've run into an issue where you can't force--useMtx
to be TRUE. This is because the first line of this chunk will always read TRUE in runSeurat.R:Would it be possible to allow the use to force a tsv instead, such as changing
(use.mtx || too.big)
to(use.mtx || (.Platform$OS.type=="windows" && too.big))
? I ask largely becausecbBuild
consistently fails for me with .mtx files, and I can't figure out precisely how to configure the cellbrowser.conf to fix this issue.Thank you!