samuel-marsh / scCustomize

R package with collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using R.
https://samuel-marsh.github.io/scCustomize/
GNU General Public License v3.0
213 stars 24 forks source link

Read_Metrics_10X error #115

Closed rhart604 closed 1 year ago

rhart604 commented 1 year ago

LOVE scCustomize!! However, I found something that doesn't seem to work.

This command: raw_metrics <- Read_Metrics_10X(base_path = file.path("..","results"), default_10X = T) Produces this error: Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '../results/C1/outsmetrics_summary.csv': No such file or directory

It looks like the "outs" directory isn't followed by a "/". Note that I'm on Windows so I generally use file.path to avoid issues with directory/file coding. I also tried saying default_10X = F and adding secondary_path = "outs/" or "outs" and I get the same error.

Thanks!

sessionInfo() R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C LC_TIME=English_United States.utf8

time zone: America/New_York tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] qs_0.25.5 viridis_0.6.3 viridisLite_0.4.2 patchwork_1.1.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[8] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0
[15] scCustomize_1.1.1 SeuratObject_4.1.3 Seurat_4.3.0.1

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7 shape_1.4.6 magrittr_2.0.3
[6] spatstat.utils_3.0-3 ggbeeswarm_0.7.2 farver_2.1.1 GlobalOptions_0.1.2 vctrs_0.6.3
[11] ROCR_1.0-11 spatstat.explore_3.2-1 paletteer_1.5.0 janitor_2.2.0 htmltools_0.5.5
[16] sctransform_0.3.5 parallelly_1.36.0 KernSmooth_2.23-22 htmlwidgets_1.6.2 ica_1.0-3
[21] plyr_1.8.8 plotly_4.10.2 zoo_1.8-12 igraph_1.5.0 mime_0.12
[26] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.6-0 R6_2.5.1 fastmap_1.1.1
[31] fitdistrplus_1.1-11 future_1.33.0 shiny_1.7.4.1 snakecase_0.11.0 digest_0.6.33
[36] colorspace_2.1-0 rematch2_2.1.2 tensor_1.5 prismatic_1.1.1 irlba_2.3.5.1
[41] labeling_0.4.2 progressr_0.13.0 fansi_1.0.4 spatstat.sparse_3.0-2 timechange_0.2.0
[46] httr_1.4.6 polyclip_1.10-4 abind_1.4-5 compiler_4.3.1 withr_2.5.0
[51] MASS_7.3-60 tools_4.3.1 vipor_0.4.5 lmtest_0.9-40 beeswarm_0.4.0
[56] httpuv_1.6.11 future.apply_1.11.0 goftest_1.2-3 glue_1.6.2 nlme_3.1-162
[61] promises_1.2.0.1 grid_4.3.1 Rtsne_0.16 cluster_2.1.4 reshape2_1.4.4
[66] generics_0.1.3 gtable_0.3.3 spatstat.data_3.0-1 tzdb_0.4.0 RApiSerialize_0.1.2
[71] hms_1.1.3 data.table_1.14.8 stringfish_0.15.8 sp_2.0-0 utf8_1.2.3
[76] spatstat.geom_3.2-2 RcppAnnoy_0.0.21 ggrepel_0.9.3 RANN_2.6.1 pillar_1.9.0
[81] ggprism_1.0.4 later_1.3.1 circlize_0.4.15 splines_4.3.1 lattice_0.21-8
[86] survival_3.5-5 deldir_1.0-9 tidyselect_1.2.0 miniUI_0.1.1.1 pbapply_1.7-2
[91] gridExtra_2.3 scattermore_1.2 matrixStats_1.0.0 stringi_1.7.12 lazyeval_0.2.2
[96] codetools_0.2-19 BiocManager_1.30.21 cli_3.6.1 RcppParallel_5.1.7 uwot_0.1.16
[101] xtable_1.8-4 reticulate_1.30 munsell_0.5.0 Rcpp_1.0.11 globals_0.16.2
[106] spatstat.random_3.1-5 png_0.1-8 ggrastr_1.0.2 parallel_4.3.1 ellipsis_0.3.2
[111] listenv_0.9.0 scales_1.2.1 ggridges_0.5.4 leiden_0.4.3 rlang_1.1.1
[116] cowplot_1.1.1

# insert reproducible example here
sessionInfo() output ```r PASTE HERE sessionInfo() output ```
samuel-marsh commented 1 year ago

Hi,

Thanks for kind words! In terms of the error can you try two things for me? First, I just released v1.1.3 today on CRAN. Can you try installing that and seeing if that fixes error? (I don't think I messed with this function at all but just want to double check it's still issue in current package release).

If not does the function succeed if you do provide full path to results folder manually instead of using file.path?

Thanks! Sam

rhart604 commented 1 year ago

Updated to v1.1.3. Tried various versions, all giving errors. Note that the first metrics_summary_csv is found in ../results/C1/outs

use scCustomize to read metrics

raw_metrics <- Read_Metrics_10X(base_path = file.path("..","results"), default_10X = T) | | 0 % ~calculating Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '../results/C1/outsmetrics_summary.csv': No such file or directory

raw_metrics <- Read_Metrics_10X(base_path = "../results", default_10X = T) | | 0 % ~calculating Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '../results/C1/outsmetrics_summary.csv': No such file or directory

raw_metrics <- Read_Metrics_10X(base_path = "../results/C1/outs",default_10X = F) | | 0 % ~calculating Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '../results/C1/outs/analysismetrics_summary.csv': No such file or directory

raw_metrics <- Read_Metrics_10X(base_path = "Z:/rhart/data/COGA/scRNAseq-Jul-2023/results",default_10X = F) | | 0 % ~calculating Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'Z:/rhart/data/COGA/scRNAseq-Jul-2023/results/C1metrics_summary.csv': No such file or directory

rhart604 commented 1 year ago

Didn't include in the output but I also tried full path to the results folder with default_10X = T. Same results.

samuel-marsh commented 1 year ago

Ok let me dig in and try and figure this out. Pretty sure it's likely a windows specific thing with file paths and I don't normally use R on windows but have access to machine I can test it on.

Also just for 100% clarity (seriously I'm not doubting you but always good to rule out all variables lol) but if you pass metrics file in C1 to read.csv it works just fine?

Best, Sam

rhart604 commented 1 year ago

Yes, read.csv load the file correctly. I'm sure you're right about being a windows issue.

samuel-marsh commented 1 year ago

Edit: I think I've got it. I'm still organizing repo following CRAN submission yesterday and setting up the new develop branches. I'll try and get trial fix out today but if not will do tomorrow.

Best, Sam

rhart604 commented 1 year ago

No rush for me. Thanks.

samuel-marsh commented 1 year ago

Actually can you just try this for me whenever you have chance and let me know what the output of windows_path is?

base_path <- file.path("..","results")
secondary_path <- "outs/"
file_path <- file.path(base_path, "C1", secondary_path)

windows_path <- paste0(file_path, "/metrics_summary.csv")
rhart604 commented 1 year ago

Value of windows_path is "../results/C1/outs/metrics_summary.csv" If I read.csv(window_path) it correctly loads the file.

samuel-marsh commented 1 year ago

Great!! Ok so fix will work and I'll get it out today or tomorrow.

rhart604 commented 1 year ago

By the way, this will find all the metrics summary files: metrics_files <- dir(file.path("..","results"),pattern = "metrics_summary.csv$",recursive = T)

samuel-marsh commented 1 year ago

Ok I think we should be all set! When you have a chance please try updating to the develop branch (v1.1.3.9001) and see if things are working for you?

Thanks for suggestion! I’ve implemented slightly differently in part to maintain continuity with non-windows based machines but also so that the right warnings/errors are triggered if a sample directory or metrics file for particular sample are missing.

Also in case of interest the function now also supports reading metrics from cellranger multi pipeline to return GEX and TCR (and soon BCR metrics)

Best, Sam

rhart604 commented 1 year ago

I'm getting an error installing the new version: Error: object 'set_diff' is not exported by 'namespace:dplyr'

I'm using install_dev("scCustomize",ref="develop")

samuel-marsh commented 1 year ago

Whoops typo coming from another in development function. Just fixed that. New v1.1.3.9002.

Give it go again.

rhart604 commented 1 year ago

Works, thanks!

However, I had a problem with Seq_QC_Plot_Basic_Combined since my dataset uses the combined GRCh38 and mm10 reference index. So here's what I had to do to hack a plot of Basic Metrics:

main_metrics <- metrics_final |> select(-starts_with("mm10")) |> select(-starts_with("GRCh38_Reads")) |> select(-starts_with("GRCh38_Fraction")) |> renamewith(~ gsub("GRCh38","",.x,fixed=T))

Seq_QC_Plot_Basic_Combined(metrics_dataframe = main_metrics, plot_by = "Full_Batch")

I realize this is probably a very rare case but I thought you'd like to know. I wouldn't think it's worth doing anything to fix it. I'm enclosing a zipped rds of the metrics_final object in case you want to see it. (Note that this was a MiSeq evaluation of some test libraries, so the depth is very, very low.)

metrics_final.rds.zip

samuel-marsh commented 1 year ago

Great!

Ya that makes sense but I agree with you that's gonna be edge case and could be variable if people make their own custom mixed genomes. If I have time I'll think about working on post-read helper function but priority is gonna be shifting in near term working to ensure Seurat V5 compatibility.

Best, Sam