maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 40 forks source link

Issue importing seurat3 file #129

Closed matthewspeir closed 2 years ago

matthewspeir commented 5 years ago

So, I'm trying to import an rds file. First errors suggested I needed to update my Seurat version. Which I did using conda. The new error seems to suggest that there's some issue with my R version?

Error:

$ cbImportSeurat -i 15W_LV_coembedded.rds -o cbOut -n 15W_LV_coembedded 
INFO:root:inFname: 15W_LV_coembedded.rds, outDir: cbOut, datasetName: 15W_LV_coembedded
INFO:root:running cbOut/runSeurat.R through Rscript
Loading required package: Seurat
Registered S3 method overwritten by 'R.oo':
  method        from       
  throw.default R.methodsS3
Error: package or namespace load failed for ‘Seurat’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/cluster/home/mspeir/miniconda3/lib/R/library/stringi/libs/stringi.so':
  libicui18n.so.64: cannot open shared object file: No such file or directory
Warning message:
package ‘Seurat’ was built under R version 3.6.1 
Reading 15W_LV_coembedded.rds
Exporting Seurat data to cbOut
Loading required package: Seurat
Error: package or namespace load failed for ‘Seurat’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/cluster/home/mspeir/miniconda3/lib/R/library/stringi/libs/stringi.so':
  libicui18n.so.64: cannot open shared object file: No such file or directory
Error in ExportToCellbrowser(sobj, "cbOut", "15W_LV_coembedded", markers.file = NULL,  : 
  This script requires that Seurat (V2 or V3) is installed
In addition: Warning message:
package ‘Seurat’ was built under R version 3.6.1 
Execution halted

real    0m10.401s
user    0m12.693s
sys     0m14.272s
INFO:root:Wrote logfile of R run to cbOut/analysisLog.txt
ERROR:root:R script did not complete successfully. Check cbOut/runSeurat.R and analysisLog.txt.

When I try to update my R version (current 3.5.1) to 3.6.1 using conda, I get a different error indicating some network issues, but ping/host/curl -I don't seem to indicate any issues with the URL that the error message mentions:

$ conda install -c r r
Collecting package metadata: failed

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://repo.anaconda.com/pkgs/main/noarch/repodata.json.bz2>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

If your current network has https://www.anaconda.com blocked, please file
a support request with your network engineering team.

SSLError(MaxRetryError('HTTPSConnectionPool(host=\'repo.anaconda.com\', port=443): Max retries exceeded with url: /pkgs/main/noarch/repodata.json.bz2 (Caused by SSLError("Can\'t connect to HTTPS URL because the SSL module is not available.",))',),)

@maximilianh have you ever run into this type of issue before? Maybe I should just scrap my conda install and just start again, this time actually utilizing conda envs, haha.

Command:

cbImportSeurat -i 15W_LV_coembedded.rds -o cbOut -n 15W_LV_coembedded 

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/ATAC_v0.1_15W_LV_coembedded
matthewspeir commented 5 years ago

Man, what a wild ride.

I basically broke my miniconda install, so I had to start over. After starting over, I was still getting some errors so I had to install conda-forge/icu and conda-forge/libopenblas.

I'm still seeing an error though at the 'markers' step. Here are the relevant lines from analysisLog.txt:

Found precomputed markers in obj@misc['markers']
Writing top 100, cluster markers to cbOut_2/markers.tsv
Error in split.default(x, g) : first argument must be a vector
Calls: ExportToCellbrowser -> ave -> lapply -> split -> split.default
In addition: Warning message:
In ExportToCellbrowser(sobj, "cbOut_2", "temp", markers.file = NULL,  :
  Embedding pca has more than 2 coordinates, taking only the first 2
Execution halted

Command:

cbImportSeurat -i 15W_LV_coembedded.rds -o cbOut_2 -n temp

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/ATAC_v0.1_15W_LV_coembedded
maximilianh commented 5 years ago

Matt is this really a normal Seurat object? I wonder if Andrew's remark, that it contains ATAC data, could be related to this error...

On Tue, Aug 20, 2019 at 4:47 PM Matt Speir notifications@github.com wrote:

Man, what a wild ride.

I basically broke my miniconda install, so I had to start over. After starting over, I was still getting some errors so I had to install conda-forge/icu https://anaconda.org/conda-forge/icu and conda-forge/libopenblas https://anaconda.org/conda-forge/libopenblas.

I'm still seeing an error though at the 'markers' step. Here are the relevant lines from analysisLog.txt:

Found precomputed markers in obj@misc['markers'] Writing top 100, cluster markers to cbOut_2/markers.tsv Error in split.default(x, g) : first argument must be a vector Calls: ExportToCellbrowser -> ave -> lapply -> split -> split.default In addition: Warning message: In ExportToCellbrowser(sobj, "cbOut_2", "temp", markers.file = NULL, : Embedding pca has more than 2 coordinates, taking only the first 2 Execution halted

Command:

cbImportSeurat -i 15W_LV_coembedded.rds -o cbOut_2 -n temp

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/ATAC_v0.1_15W_LV_coembedded

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AACL4TLTDWLQJDH2ZCSZT4LQFP7YVA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4WRSTY#issuecomment-523049295, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TKTWIFAG7TPT2NTJMDQFP7YVANCNFSM4IMMXRBA .

maximilianh commented 5 years ago

Never mind! This must be a bug, there are no markers in this file. The auto-detection is failing... one sec...

On Tue, Aug 20, 2019 at 4:58 PM Maximilian Haeussler maximilianh@gmail.com wrote:

Matt is this really a normal Seurat object? I wonder if Andrew's remark, that it contains ATAC data, could be related to this error...

On Tue, Aug 20, 2019 at 4:47 PM Matt Speir notifications@github.com wrote:

Man, what a wild ride.

I basically broke my miniconda install, so I had to start over. After starting over, I was still getting some errors so I had to install conda-forge/icu https://anaconda.org/conda-forge/icu and conda-forge/libopenblas https://anaconda.org/conda-forge/libopenblas.

I'm still seeing an error though at the 'markers' step. Here are the relevant lines from analysisLog.txt:

Found precomputed markers in obj@misc['markers'] Writing top 100, cluster markers to cbOut_2/markers.tsv Error in split.default(x, g) : first argument must be a vector Calls: ExportToCellbrowser -> ave -> lapply -> split -> split.default In addition: Warning message: In ExportToCellbrowser(sobj, "cbOut_2", "temp", markers.file = NULL, : Embedding pca has more than 2 coordinates, taking only the first 2 Execution halted

Command:

cbImportSeurat -i 15W_LV_coembedded.rds -o cbOut_2 -n temp

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/ATAC_v0.1_15W_LV_coembedded

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AACL4TLTDWLQJDH2ZCSZT4LQFP7YVA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4WRSTY#issuecomment-523049295, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TKTWIFAG7TPT2NTJMDQFP7YVANCNFSM4IMMXRBA .

maximilianh commented 5 years ago

OK hope that this is fixed now. Matt can you test this new version? I'm not releasing it, because the new collections support hasn't had enough testing. But you can try it:

pip install /cluster/home/max/projects/czi/cellBrowser/dist/cellbrowser-0.5.49.post0.dev7.tar.gz

On Tue, Aug 20, 2019 at 5:00 PM Maximilian Haeussler maximilianh@gmail.com wrote:

Never mind! This must be a bug, there are no markers in this file. The auto-detection is failing... one sec...

On Tue, Aug 20, 2019 at 4:58 PM Maximilian Haeussler < maximilianh@gmail.com> wrote:

Matt is this really a normal Seurat object? I wonder if Andrew's remark, that it contains ATAC data, could be related to this error...

On Tue, Aug 20, 2019 at 4:47 PM Matt Speir notifications@github.com wrote:

Man, what a wild ride.

I basically broke my miniconda install, so I had to start over. After starting over, I was still getting some errors so I had to install conda-forge/icu https://anaconda.org/conda-forge/icu and conda-forge/libopenblas https://anaconda.org/conda-forge/libopenblas.

I'm still seeing an error though at the 'markers' step. Here are the relevant lines from analysisLog.txt:

Found precomputed markers in obj@misc['markers'] Writing top 100, cluster markers to cbOut_2/markers.tsv Error in split.default(x, g) : first argument must be a vector Calls: ExportToCellbrowser -> ave -> lapply -> split -> split.default In addition: Warning message: In ExportToCellbrowser(sobj, "cbOut_2", "temp", markers.file = NULL, : Embedding pca has more than 2 coordinates, taking only the first 2 Execution halted

Command:

cbImportSeurat -i 15W_LV_coembedded.rds -o cbOut_2 -n temp

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/ATAC_v0.1_15W_LV_coembedded

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AACL4TLTDWLQJDH2ZCSZT4LQFP7YVA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4WRSTY#issuecomment-523049295, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TKTWIFAG7TPT2NTJMDQFP7YVANCNFSM4IMMXRBA .

matthewspeir commented 5 years ago

Thanks, Max! That seems to have fixed that issue. Everything exported correctly for that dataset.

Trying to import another of these datasets using cbImportSeurat and I get a different error:

Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: ExportToCellbrowser -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted

Command:

cbImportSeurat -i fetalCombined_v1.0.rds -o cbOut_2 -n fetalCombined_v1.0

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1
maximilianh commented 5 years ago

This is because the matrix is pretty huge... do you know how big?

I'm running as.matrix on the matrix which has worked until now but apparently won't work anymore... darn...

On Tue, Aug 20, 2019 at 8:18 PM Matt Speir notifications@github.com wrote:

Thanks, Max! That seems to have fixed that issue. Everything exported correctly for that dataset.

Trying to import another of these datasets using cbImportSeurat and I get a different error:

Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105 Calls: ExportToCellbrowser -> as.matrix -> as.matrix.Matrix -> as -> asMethod Execution halted

Command:

cbImportSeurat -i fetalCombined_v1.0.rds -o cbOut_2 -n fetalCombined_v1.0

Files:

/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AACL4TNEET3TLX5U2PCHXE3QFQYPTA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XGTKA#issuecomment-523135400, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TIYZUPUA4GMEYNJLYTQFQYPTANCNFSM4IMMXRBA .

matthewspeir commented 5 years ago

I don't know. I can ask Andrew this afternoon when I see him.

apblair commented 5 years ago

I believe the matrix is ~12 gb. After I removed genes that have a total count less than 10 I was able to export the matrix. I haven't been able to figure out how to increase R's memory size on Linux distros.

On Tue, Aug 20, 2019 at 11:32 AM Matt Speir notifications@github.com wrote:

I don't know. I can ask Andrew this afternoon when I see him.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AEF6FGOJET6PJRTBD6RENULQFQ2CVA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XH5ZA#issuecomment-523140836, or mute the thread https://github.com/notifications/unsubscribe-auth/AEF6FGNZGLFVLF6XPW23C23QFQ2CVANCNFSM4IMMXRBA .

maximilianh commented 5 years ago

Is this the trimmed down version? Were you able to run as.matrix on your big matrix?

Yes, it's possible this is too big for R's normal matrix.

On Tue, Aug 20, 2019 at 8:40 PM Andrew Blair notifications@github.com wrote:

I believe the matrix is ~12 gb. After I removed genes that have a total count less than 10 I was able to export the matrix. I haven't been able to figure out how to increase R's memory size on Linux distros.

On Tue, Aug 20, 2019 at 11:32 AM Matt Speir notifications@github.com wrote:

I don't know. I can ask Andrew this afternoon when I see him.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AEF6FGOJET6PJRTBD6RENULQFQ2CVA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XH5ZA#issuecomment-523140836 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AEF6FGNZGLFVLF6XPW23C23QFQ2CVANCNFSM4IMMXRBA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AACL4TJROSNUIGSDPWCO3X3QFQ3DDA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XIXUI#issuecomment-523144145, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TMGSWKXQAGM7JPPXO3QFQ3DDANCNFSM4IMMXRBA .

apblair commented 5 years ago

No this is not the trimmed down version and I have not been able to run as.matrix on the full matrix. I shared my cell browser export scripts with Matt which will generate all the necessary files for a session.

On Tue, Aug 20, 2019 at 11:59 AM Maximilian Haeussler < notifications@github.com> wrote:

Is this the trimmed down version? Were you able to run as.matrix on your big matrix?

Yes, it's possible this is too big for R's normal matrix.

On Tue, Aug 20, 2019 at 8:40 PM Andrew Blair notifications@github.com wrote:

I believe the matrix is ~12 gb. After I removed genes that have a total count less than 10 I was able to export the matrix. I haven't been able to figure out how to increase R's memory size on Linux distros.

On Tue, Aug 20, 2019 at 11:32 AM Matt Speir notifications@github.com wrote:

I don't know. I can ask Andrew this afternoon when I see him.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AEF6FGOJET6PJRTBD6RENULQFQ2CVA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XH5ZA#issuecomment-523140836

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AEF6FGNZGLFVLF6XPW23C23QFQ2CVANCNFSM4IMMXRBA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AACL4TJROSNUIGSDPWCO3X3QFQ3DDA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XIXUI#issuecomment-523144145 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TMGSWKXQAGM7JPPXO3QFQ3DDANCNFSM4IMMXRBA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AEF6FGKXMUIY6E4MCKTYI3DQFQ5H7A5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4XKLSA#issuecomment-523150792, or mute the thread https://github.com/notifications/unsubscribe-auth/AEF6FGJUICZODPWUY34OMPLQFQ5H7ANCNFSM4IMMXRBA .

maximilianh commented 5 years ago

OK, if you want Matt to run the exporter script, you'd probably have to trim the matrix first, because it seems the untrimmed matrix cannot be exported with as.matrix (and I don't know how one could write the big matrix to a file without as.matrix, thought there should be a way)

matthewspeir commented 5 years ago

Yeah, I was able to use Andrew's script to extract the data. If it's okay with @apblair I can share his little Rscript with you?

apblair commented 5 years ago

Yes, happy to help!

On Wed, Aug 21, 2019 at 8:41 AM Matt Speir notifications@github.com wrote:

Yeah, I was able to use Andrew's script to extract the data. If it's okay with @apblair https://github.com/apblair I can share his little Rscript with you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/129?email_source=notifications&email_token=AEF6FGMBPTDUEWA523Q4CETQFVOZRA5CNFSM4IMMXRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD42DO6I#issuecomment-523515769, or mute the thread https://github.com/notifications/unsubscribe-auth/AEF6FGJ7KOV75JS7NFHQUTDQFVOZRANCNFSM4IMMXRBA .

matthewspeir commented 4 years ago

@maximilianh is there anything left to do for this ticket? cbImportSeurat now uses the same methods as Andrew R code, right?

maximilianh commented 4 years ago

This was a long time ago, and the code has been through a few iterations. In short, yes it should work, but I'm not using his code, if the matrix is too big, I'm writing the numbers as an "mtx" file and two small text files, one for genes and one for the cell IDs. That's weirdly enough the most standard way to save it.

The long version is that we're hitting a limit here and people who claimed before that I shouldn't use .tsv files may have been right, at least for the expression matrix. The problem is that R by default cannot handle ANY number that is longer than 32bits. This means that even just reading such a matrix is impossible, because you can read every line, but then can't make a matrix from these anymore. The only way around it right now is to read the matrix as a sparse matrix, so just the non-0 numbers. This gets us quite a bit over 32bit numbers, because 90% of the numbers in the expression matrix are zeros. This means that it has to be saved as a sparse matrix from the start, as an .mtx file. This added many if-then-elses everywhere, as I now accept .mtx files natively in the cell browser. You can specify .mtx in cellbrowser.conf and run cbBuild and the cell browser will not even create a .tsv but will copy through the .mtx. It worked a couple of times and makes me think that I probably should make this the default matrix format and move away from .tsv from now on. But that would mean that cbBuild requires scipy to read the .mtx file and that's a huge dependency, too...

CZI will claim that the real solution are .hdf files or their own crazy .zarr files, but these require all these dependencies and some people will need to install >100MB of libraries for that, so I don't like these at all. .mtx has the big advantage that it's a normal text file at least.

matthewspeir commented 2 years ago

I think this has long been solved.