maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
102 stars 40 forks source link

group related datasets into a hierarchy ending with errors of copying exprMatrix.tsv.gz to itself #237

Closed yesonse closed 2 years ago

yesonse commented 2 years ago

Dear there,

I run a local host with about 20 datasets. Now I like to group them into collections as suggested https://cellbrowser.readthedocs.io/en/master/collections.html. After I run "cbBuild -r", the cell browser showed the collections well, but could not find each dataset in each collection. I found there was no cellbrowser.conf in the subdirectory of collection, and made one for each. I tried to run cbBuild in the subdirectory end with errors of copying exprMatrix.tsv.gz to itself. I tried to read some codes from the source, and believe it is right to put the dataset under the collection first, then run cbBuild in the sub-directory of the dataset.

I also tried to run cbBuild in directory not under dataRoot but end with deactivating the hierarchy.

Please advise the best way to recovery the dataset in each collection and add new dataset into a collections.

I appreciated the great of work building the cell browser.

Thanks a lot.

Robin

matthewspeir commented 2 years ago

Hi, Robin.

Can your provide the full error message you are receiving when trying to run cbBuild for these datasets?

yesonse commented 2 years ago

Thanks matthew.

Here is it.

root@bioinformatics:/home/CellBrowser/OPC/otx169to176integrated# cbBuild -i cellbrowser.conf -o /home/CellBrowser INFO:root:Determining if /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz needs to be created INFO:root:input matrix has input file size that is different from previously processed matrix. Expression matrix must be reindexed. Old file(s): {'fname': '/home/anaconda/opc211101/otx169to176integrated/exprMatrix.tsv.gz', 'md5': '970a1f0448', 'size': 177807570, 'mtime': '2022-03-08 03:07:17'}, current file: 120814330 INFO:root:/home/CellBrowser/OPC/otx169to176integrated/meta.tsv has the same md5 as in /home/CellBrowser/OPC/otx169to176integrated/dataset.json, no need to rebuild meta data INFO:root:Reading sample names from /home/CellBrowser/OPC/otx169to176integrated/meta.tsv INFO:root:Checking and reordering meta data to /home/CellBrowser/OPC/otx169to176integrated/meta.tsv INFO:root:Reading sample names from /home/CellBrowser/OPC/otx169to176integrated/meta.tsv INFO:root:Reading headers from file /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz INFO:root:Data contains 14244 samples/cells INFO:root:Converting to numbers and compressing meta data fields INFO:root:Field Cell: type uniqueString, 14244 different values INFO:root:Field origident: type enum, 4 different values INFO:root:Field nCount_RNA: type int, 3749 different values INFO:root:Field nFeature_RNA: type int, 2422 different values INFO:root:Field percentmt: type float, 7721 different values INFO:root:Field percentribo: type float, 11120 different values INFO:root:Field predictedsubclassscore: type float, 13235 different values INFO:root:Field predictedsubclass: type enum, 11 different values INFO:root:Field SScore: type float, 14244 different values INFO:root:Field G2MScore: type float, 14244 different values INFO:root:Field Phase: type enum, 3 different values INFO:root:Field seurat_clusters: type enum, 11 different values INFO:root:Field CellType: type enum, 5 different values INFO:root:Field integrated_snn_res01: type enum, 5 different values INFO:root:Field integrated_snn_res02: type enum, 9 different values INFO:root:Field integrated_snn_res03: type enum, 9 different values INFO:root:Field integrated_snn_res04: type enum, 11 different values INFO:root:Field integrated_snn_res05: type enum, 11 different values INFO:root:Field Cluster: type enum, 11 different values INFO:root:Indexing meta file /home/CellBrowser/OPC/otx169to176integrated/meta.tsv to /home/CellBrowser/OPC/otx169to176integrated/meta.index INFO:root:Kept 14244 cells present in both meta data file and expression matrix INFO:root:Auto-detecting number type of /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz INFO:root:Auto-detect: Numbers in matrix are of type 'float' INFO:root:Auto-detected gene IDs type: symbols INFO:root:Copying/compressing /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz to /home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz cp: '/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz' and '/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz' are the same file ERROR:root:Could not run: cp "/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz" "/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz" ERROR:root:Unexpected error: (<class 'SystemExit'>, SystemExit(1), <traceback object at 0x7fae7e6d3340>) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 4783, in cbBuildCli build(confFnames, outDir, port, redo=options.redo) File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 4598, in build convertDataset(inDir, inConf, outConf, datasetDir, redo) File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 3955, in convertDataset convertExprMatrix(inConf, outMatrixFname, outConf, sampleNames, geneToSym, datasetDir, needFilterMatrix) File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 3285, in convertExprMatrix matType = copyMatrixTrim(matrixFname, outMatrixFname, metaSampleNames, needFilterMatrix, geneToSym, outConf, matType) File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 2525, in copyMatrixTrim ret = runCommand(cmd) File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 2459, in runCommand errAbort("Could not run: %s" % cmd) File "/usr/local/lib/python3.8/dist-packages/cellbrowser/cellbrowser.py", line 168, in errAbort sys.exit(1) SystemExit: 1 root@bioinformatics:/home/CellBrowser/OPC/otx169to176integrated#

last time I run cbBuild in "/home/anaconda/opc211101/otx169to176integrated/" successfully. After I re-set the dataRoot and group my dataset. I copy the old cellbrowser.conf to the subdirectory and try to rebuild everything.

I found the inMatrix md5 is different from outMatrix in the dataset.json

{ "fileVersions": { "inMeta": { "fname": "/home/anaconda/opc211101/otx169to176integrated/meta.tsv", "md5": "5d5e981856", "size": 2189471, "mtime": "2022-03-08 04:19:11" }, "outMeta": { "fname": "/home/UCSCcellbrowser/otx169to176integrated/meta.tsv", "md5": "5d5e981856", "size": 2189471, "mtime": "2022-03-08 04:26:16" }, "inMatrix": { "fname": "/home/anaconda/opc211101/otx169to176integrated/exprMatrix.tsv.gz", "md5": "970a1f0448", "size": 177807570, "mtime": "2022-03-08 03:07:17" }, "outMatrix": { "fname": "/home/UCSCcellbrowser/otx169to176integrated/exprMatrix.tsv.gz", "md5": "0c55cac114", "size": 120814330, "mtime": "2022-03-08 04:29:46" }, "conf": { "fname": "/home/anaconda/opc211101/otx169to176integrated/cellbrowser.conf", "md5": "85c7479abb", "size": 1170, "mtime": "2022-03-08 04:25:14" } }, "sampleCount": 14244, "matrixWasFiltered": true, "metaFields": [ { "name": "Cell", "label": "Cell", "type": "uniqueString", "maxSize": 20, "diffValCount": 14244, "md5": "770c0e2419" },

matthewspeir commented 2 years ago

Robin, can you share more details about how you installed the cellBrowser package (i.e. pip, conda)? And maybe what operating system you're running on (i.e. Windows, Mac OSX, or Linux)?

@maximilianh Do you have ideas? I've never seen this error before:

ERROR:root:Could not run: cp "/home/CellBrowser/OPC/otx169to176integrated/exprMatrix.tsv.gz"

yesonse commented 2 years ago

Hi Matthew,

I installed it by pip two years ago and upgrade it recently. I have an ubuntu and host an ucsc cellbrowser well for about 20 datasets (20 directories) without hierarchy at /home/UCSCcellbrowser/. Now I have to organize those datasets with hierarchy. So I configured my cellbrowser with a dataroot "/home/CellBrowser/", made several directories there as collections, made "cellbrowser.conf" there too. Then I copy the old 20 directories to these collections. I try to rebuilt each dataset.

would you mind to suggest what is the best way to re-set the cellbrowser with hierarchy?

Thanks

Robin

matthewspeir commented 2 years ago

Hmm, that's odd that the hierarchy stuff didn't work for you.

Just to be sure, you've removed the 'dataRoot' line from the .cellbrowser.conf file in your home directory?

yesonse commented 2 years ago

I did not have a .cellbrowser.conf before. Just made one with line of "dataRoot=/home/CellBrowser/".

yesonse commented 2 years ago

Hi Matthew,

How do you add a dataset to a collection with hierarchy? If I run cbBuild in a directory not under dataRoot, it just de-activated the hierarchy. If I move the output folder of cbSeurat under a collection of dataRoot and run cbBuild there, I got the same errors as showed.

Thanks

Robin

matthewspeir commented 2 years ago

Could you try setting up a .cellbrowser.conf (note the '.' at the beginning of the file name) in your home directory with the dataRoot line to see if that helps?

maximilianh commented 2 years ago

Yes without the config file it will not work. I believe that we must know the root directory otherwise we don’t know how deep we are in the tree…

On Wed 23 Mar 2022 at 19:23, Matt Speir @.***> wrote:

Could you try setting up a .cellbrowser.conf (note the '.' at the beginning of the file name) in your home directory with the dataRoot line to see if that helps?

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/237#issuecomment-1076672626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TNBWQK6T7LURPJMB5DVBNOQZANCNFSM5RE7LHTQ . You are receiving this because you were mentioned.Message ID: @.***>

yesonse commented 2 years ago

I had ".cellbrowser.conf" with dataRoot line. For redundancy, I also set CBDATAROOT=/home/CellBrowser/.

my error came from here

2515 shutil.copyfile(inFname, outFname)

when I move the default outputs of cbSeurat to a subdirector of dataRoot and run cbBuild there: outDir == inDir, inFname == outFname and errors happened.

So I renamed file 'exprMatrix.tsv.gz' to 'oldMatrix.tsv.gz' in the output of cbSeurat and run cbBuild again, it works.

maximilianh commented 2 years ago

I don't fully understand what your setup is, but for us, it never happens that outDir == inDir. We have a htdocs directory for the webserver and a data root directory, totally different directory trees:

My conf has these lines in it:

htmlDir = "/usr/local/apache/htdocs-cells" dataRoot = "/hive/data/inside/cells/datasets/" outDirs = {"alpha" : "/usr/local/apache/htdocs-cells", "beta" : "/usr/local/apache/htdocs-cells-beta/" }

On Wed, Mar 23, 2022 at 10:10 PM yesonse @.***> wrote:

I had ".cellbrowser.conf" with dataRoot line. For redundancy, I also set CBDATAROOT=/home/CellBrowser/.

my error came from here

2515 shutil.copyfile(inFname, outFname)

when I move the default outputs of cbSeurat to a subdirector of dataRoot and run cbBuild there: outDir == inDir, inFname == outFname and errors happened.

So I renamed file 'exprMatrix.tsv.gz' to 'oldMatrix.tsv.gz' in the output of cbSeurat and run cbBuild again, it works.

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/237#issuecomment-1076823776, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TKZPN7EA44LJEIXEDLVBOCETANCNFSM5RE7LHTQ . You are receiving this because you were mentioned.Message ID: @.***>

yesonse commented 2 years ago

I managed an internal bioinformatic server and used it to host the cell browser for my colleagues of 30-40 people. Several colleagues might add some datasets sometime independently. I did not make a specific htmlDir or dataRoot. I just simply made one directory and have httpd visit that directory.

So my case could be special and I thought I could set outDir as the dataRoot. I have not realized that hierarchy need a specific "dataRoot" other than outDir for 'cbBuild -o'. I have thought hierarchy only need a tree of outputs of cbBuild, not related to where and how you run cbBuild.

I figured out that I just need to put the input files needed for cbBuild in other places, put their paths in the cellbrowser.conf under a tree of directories, run cbBuild under each subdirectory.

I also modified the codes and make it works when outDir == inDir :), not re-write the exprMatrix, which make it easy to rebuild the tree of outputs anytime.

Thank you very much for your great of work of build the cellbrowser!

matthewspeir commented 2 years ago

Hi, @yesonse. Can we close this ticket? Or are you still running into issues?

yesonse commented 2 years ago

I am fine now.

maximilianh commented 1 year ago

Hi Robin,

thanks for your question. I'm sure we can quickly help you fix this.

I tried to run cbBuild in the subdirectory end with errors of copying exprMatrix.tsv.gz to itself.

It sounds as if this is the source of the problem. Can you show us the exact error?

And show us your directory structure: I imagine there is one root directory, under it a directory for the collection, and in this directory, one directory for each dataset? And all of these directories have a cellbrowser.conf ?

It's possible that we'll have to improve the documentation. Few groups are using the collections yet but they have always been working well for us for three years now.

On Sun, Mar 20, 2022 at 5:58 AM yesonse @.***> wrote:

Dear there,

I have a host with about 20 datasets. Now I like to group them into collections as here https://cellbrowser.readthedocs.io/en/master/collections.html. After I run "cbBuild -r", the cell browser showed the collections well, but could not find each dataset in each collection. I found there was no cellbrowser.conf in the subdirectory of collection, and made one for each. I tried to run cbBuild in the subdirectory end with errors of copying exprMatrix.tsv.gz to itself. I tried to read some codes from the source, and believe it is right to put the dataset under the collection first, then run cbBuild in the sub-directory of the dataset.

I also tried to run cbBuild in directory not under dataRoot but end with deactivating the hierarchy.

Please advise the best way to recovery the dataset in each collection and add new dataset into a collections.

I appreciated the great of work building the cell browser.

Thanks a lot.

Robin

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/237, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TIJGJJSIANOPN76FIDVA2WAXANCNFSM5RE7LHTQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

maximilianh commented 1 year ago

Oh.... now I understand! You never made an htdocs directory.

yes, this won't work. I never thought that someone would not have two separate directories. It sounds like we should mention that somewhere in the documentation, that they really must be separate, as otherwise cbBuild will overwrite its own output files...

On Thu, Mar 24, 2022 at 8:44 PM yesonse @.***> wrote:

I managed an internal bioinformatic server and used it to host the cell browser for my colleagues of 30-40 people. Several colleagues might add some datasets sometime independently. I did not make a specific htmlDir or dataRoot. I just simply made one directory and have httpd visit that directory.

So my case could be special and I thought I could set outDir as the dataRoot. I have not realized that hierarchy need a specific "dataRoot" other than outDir for 'cbBuild -o'. I have thought hierarchy only need a tree of outputs of cbBuild, not related to where and how you run cbBuild.

I figured out that I just need to put the input files needed for cbBuild in other places, put their paths in the cellbrowser.conf under a tree of directories, run cbBuild under each subdirectory.

Thank you very much for your great of work of build the cellbrowser!

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/237#issuecomment-1078051670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TKELJI3LTUKMRG4JW3VBTAZRANCNFSM5RE7LHTQ . You are receiving this because you were mentioned.Message ID: @.***>