vallotlab / ChromSCape

ChromSCape
https://vallotlab.github.io/ChromSCape/
14 stars 8 forks source link

"raw_mat" not defined #6

Closed SebastienLemaireCurie closed 1 year ago

SebastienLemaireCurie commented 1 year ago

When loading fragment data, after it seemed to have put data in memory, ChromSCape stop with a error saying that the "raw_mat" variable is unknown. I would precise that I am on a newly installed R v4.2.

I gave a look at the code and I found this chunk containing the definition of "raw_mat" (line ~539): " if(original_bin_size < 300 & input$rebin_matrices == TRUE){ print("Saving raw matrix as average bin size is lesser than 300bp, for later use (coverage)...") raw_mat = datamatrix } "

It seems lacking initialization of "raw_mat" variable. Also, I am surprised of this condition "original_bin_size < 300" while I asked in the software a 50kbp binning.

For now, I solved the problem by adding the "else" part with "raw_mat = datamatrix".

Kind regards,

Pacomito commented 1 year ago

Hello Sebastien,

Thank you for posting this issue,

I am not able to reproduce your error on R version 4.2.1.

'raw_mat' is supposed to be initialized l.424 https://github.com/vallotlab/ChromSCape/blob/01f7572e18c39a7e7ce46b6ed95fb62c64c1d931/inst/server.R#L422-L424

If you update ChromScape with the latest changes (devtools::install_github("vallotlab/ChromSCape"), do you still get this behavior ?

If yes, could you copy the entire logs just to be sure the problem comes from there ?

Thanks, Pacome

SebastienLemaireCurie commented 1 year ago

Hi Pacomito,

Indeed, I did not have these lines in my previous installation (from Bioconductor). I tried with a new installation from the github version directly as suggested and It worked. Thank you for the fixing.

It may be worth to check the "server.R" script in Bioconductor.

Sorry for the bother,

Kind regards.

Pacomito commented 1 year ago

Thanks you very much for your feedback, Indeed the version of Bioconductor might have the bug, I will correct it,

Thanks , Pacome

DanSchnell commented 1 year ago

I think I may be running into the same problem with count data input-- files seem to load and output directories are created, but everything grayed-out in browser screen after that and is unresponsive... And I think I saw a message along the lines of 'raw data not found' in the linux background. Was this issue fixed in version currently on bioconductR?
Is it definitely fixed for count matrix input in the current github version?

Thanks, Dan

Pacomito commented 1 year ago

Hello Dan,

Thank you for posting this issue, If you try to install the latest github version (devtools::install_github("vallotlab/ChromSCape")), do you still get this error ?

If not, is your input matrix a "Dense" or "Sparse" matrix (10X-like format) ? What features was it counted on (e.g. bins of what size or peaks of what average size or else ) ?

Best, Pacome

DanSchnell commented 1 year ago

Thanks very much for your prompt reply Pacome. The version I'm working with was installed from Bioconductor. The input matrices I loaded were two from the example datasets: HBC_22.tsv & HBC_22_TamR.tsv.

I also unzipped some of the Buenorosto bed files and input them as SC bed. They also seemed to read in successfully. As with above datasets, when I hit the Create Analysis button, there was a short pause and then the interface totally grayed-out and in the linux background there was the error message about raw_mat.

The package is installed on a computing cluster here and before asking that support team to re-install the package, it would be good if you are able to verify that the github version has some code difference(s) that should get around this problem?

Thanks, Dan

DanSchnell commented 1 year ago

Screen shots of behavior with 100 unzipped bed files divided into 2 samples of 50.

Screen Shot 2022-12-02 at 12 03 20 PM Screen Shot 2022-12-02 at 12 05 25 PM
Pacomito commented 1 year ago

Hello Dan,

I think the error was fixed in commit 23d2f04 , the raw_mat is now intialized properly.

I re-tested with the latest version (GitHub) on the HBCx22 (scChIP_mouse_PDX) and the scBED from Ku et al. downloaded from the Dropbox and it works fine.

So my suggestion would be to install the newest version from GitHub. ( devtools::install_github("vallotlab/ChromSCape") )

I also tested on the Bioconductor version 3.14 (R 4.1.3) and it is also working fine.

Sorry for the trouble, Cheers, Pacôme

DanSchnell commented 1 year ago

Thanks very much Pacôme, installing from github did get me past that error.

Also, could you let me know what the required file setup is for sparse input? E.g., format of the data matrix itself (mtx, other?) and whether the barcode and feature information should be imbedded in the data matrix file or in 2 separate files (if so, naming requirements).

When I did try sparse input, I did see a pop-up but it referred to bed files, not sparse file(s).

Best, Dan

From: Pacome Prompsy @.> Date: Monday, December 5, 2022 at 5:16 AM To: vallotlab/ChromSCape @.> Cc: Schnell, Dan @.>, Comment @.> Subject: Re: [vallotlab/ChromSCape] "raw_mat" not defined (Issue #6)

This email originated from an EXTERNAL sender to CCHMC. Proceed with caution when replying, opening attachments, or clicking links in this message.

Hello Dan,

I think the error was fixed in commit 23d2f04https://github.com/vallotlab/ChromSCape/commit/23d2f0485c683bdba3925b644c62d1f3d16a38bc , the raw_mat is now intialized properly.

I re-tested with the latest version (GitHub) on the HBCx22 (scChIP_mouse_PDX) and the scBED from Ku et al. downloaded from the Dropboxhttps://www.dropbox.com/sh/vk7umx3ksgoez3x/AACEq9zn-rRbtwf_Al9uEUaQa?dl=0 and it works fine.

So my suggestion would be to install the newest version from GitHub. ( devtools::install_github("vallotlab/ChromSCape") )

I also tested on the Bioconductor version 3.14 (R 4.1.3) and it is also working fine.

Sorry for the trouble, Cheers, Pacôme

— Reply to this email directly, view it on GitHubhttps://github.com/vallotlab/ChromSCape/issues/6#issuecomment-1337083365, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJIX34OSTARPUOYMJJ6EMHLWLW6I3ANCNFSM6AAAAAAQVWKQEA. You are receiving this because you commented.Message ID: @.***>

Pacomito commented 1 year ago

Hi Dan,

There should be 3 files per sample (.mtx, barcodes.tsv and features.tsv). There is an example of the files available here for the Buenorostro et al., 2018 scATAC seq ( DropBox ).

When uploading to ChromSCape, you should select the root of the directory, as for single-cell BED files (that is why the pop up explains for scBED only).

Cheers, Pacôme

DanSchnell commented 1 year ago

Thanks Pacôme.

Thanks Pacôme!

One of the files I was trying was named regions instead of features, so that was probably causing a problem.

Best, Dan

From: Pacome Prompsy @.> Date: Monday, December 5, 2022 at 10:27 AM To: vallotlab/ChromSCape @.> Cc: Schnell, Dan @.>, Comment @.> Subject: Re: [vallotlab/ChromSCape] "raw_mat" not defined (Issue #6)

This email originated from an EXTERNAL sender to CCHMC. Proceed with caution when replying, opening attachments, or clicking links in this message.

Hi Dan,

There should be 3 files per sample (.mtx, barcodes.tsv and features.tsv). There is an example of the files available here for the Buenorostro et al., 2018 scATAC seq ( DropBoxhttps://www.dropbox.com/sh/vk7umx3ksgoez3x/AACEq9zn-rRbtwf_Al9uEUaQa?dl=0&preview=GSE96769_scATAC_Buenorostro_2018.zip ).

When uploading to ChromSCape, you should select the root of the directory, as for single-cell BED files (that is why the pop up explains for scBED only).

Cheers, Pacôme

— Reply to this email directly, view it on GitHubhttps://github.com/vallotlab/ChromSCape/issues/6#issuecomment-1337575573, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJIX34N3MDCAAY5ANWU74HDWLYCXHANCNFSM6AAAAAAQVWKQEA. You are receiving this because you commented.Message ID: @.***>

DanSchnell commented 1 year ago

Hi Pacôme,

The features associated with my input files are genomic regions as opposed to gene symbols, ensemble IDs, etc.

Example: “1:10000000-10005000”. It is Hs data and I had hg38 clicked.

I ran into an error regarding duplicate rownames… Independent checking in R before and after input file creation found no duplicate rownames (or columns names).

Is genomic region in this format supported?

I believe there are multiple regions per gene, could that be causing a problem?

I could try replacing colons and dashes with underscores or something else, might that help?

Thanks very much for your continued assistance.

Best,

Dan

Running on main experiment...

ChromSCape::create_scExp - the matrix has 500 cells and 560235 features.

ChromSCape::create_scExp - 138051 features with 0 signals were removed.

user system elapsed

10.769 0.055 10.842

[1] "Filter..."

ChromSCape::filter_scExp - 495 cells pass the threshold of 200 minimum reads and are lower than the 99th centile of library size ~= 25342 reads.

ChromSCape::filter_scExp - 47815 features pass the threshold of 10 count per feature.

user system elapsed

0.219 0.017 0.237

[1] "Finding top covered features..."

ChromScape::find_top_features - 47815 features kept as most covered features...

user system elapsed

0.157 0.002 0.160

[1] "Normalizing with method TFIDF..."

user system elapsed

0.066 0.003 0.069

[1] "Feature annotation..."

ChromSCape::feature_annotation_scExp - Selecting hg38 genes from Gencode.

Warning in .Seqinfo.mergexy(x, y) :

The 2 combined objects have no sequence levels in common. (Use

suppressWarnings() to suppress this warning.)

Warning in max(.data$distanceToTSS) :

no non-missing arguments to max; returning -Inf

Warning: non-unique values when setting 'row.names':

Timing stopped at: 0.253 0.009 0.283

Warning: Error in .rowNamesDF<-: duplicate 'row.names' are not allowed

Pacomito commented 1 year ago

Hello Dan,

Indeed the current supported format for rownames is either "chr1:10000000-10005000" or "chr1_10000000_10005000". But in any case the "chr" characters have to be present. This is why you get the error, Best, Pacôme

DanSchnell commented 1 year ago

Thanks very much Pacôme. Yes, that is working for me now.

Is there a way to provide the Shiny tool with cell or sample level meta data?

Best, Dan

From: Pacome Prompsy @.> Date: Tuesday, December 6, 2022 at 3:56 AM To: vallotlab/ChromSCape @.> Cc: Schnell, Dan @.>, Comment @.> Subject: Re: [vallotlab/ChromSCape] "raw_mat" not defined (Issue #6)

This email originated from an EXTERNAL sender to CCHMC. Proceed with caution when replying, opening attachments, or clicking links in this message.

Hello Dan,

Indeed the current supported format for rownames is either "chr1:10000000-10005000" or "chr1_10000000_10005000". But in any case the "chr" characters have to be present. This is why you get the error, Best, Pacôme

— Reply to this email directly, view it on GitHubhttps://github.com/vallotlab/ChromSCape/issues/6#issuecomment-1338986678, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJIX34OWMLS2HY5TG4YJQPTWL35U3ANCNFSM6AAAAAAQVWKQEA. You are receiving this because you commented.Message ID: @.***>

Pacomito commented 1 year ago

Hi Dan, There is currently no way to provide a metadata with the sample but this would be a very interesting to implement. If you currently want to add metadata, the best way would be to add it in the SingleCellExperiment's colData using ChromSCape functions in R directly.

Best, Pacôme

DanSchnell commented 1 year ago

Understood--thanks very much for your quick reply! Have a nice day.

From: Pacome Prompsy @.> Date: Tuesday, December 6, 2022 at 10:18 AM To: vallotlab/ChromSCape @.> Cc: Schnell, Dan @.>, Comment @.> Subject: Re: [vallotlab/ChromSCape] "raw_mat" not defined (Issue #6)

This email originated from an EXTERNAL sender to CCHMC. Proceed with caution when replying, opening attachments, or clicking links in this message.

Hi Dan, There is currently no way to provide a metadata with the sample but this would be a very interesting to implement. If you currently want to add metadata, the best way would be to add it in the SingleCellExperiment's colData using ChromSCape functions in R directly.

Best, Pacôme

— Reply to this email directly, view it on GitHubhttps://github.com/vallotlab/ChromSCape/issues/6#issuecomment-1339540677, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJIX34P6LOGL4BVPGNVXX53WL5KN3ANCNFSM6AAAAAAQVWKQEA. You are receiving this because you commented.Message ID: @.***>

DanSchnell commented 1 year ago

Hi Pacôme,

Regarding input in Sparse format, could you confirm the file naming as shown below is appropriate/works.

Specifically, that _barcodes extension should be .txt and the _features extension should be .bed. Both files seem to be tab-delimited.

I’ve tried w/out success several combinations of ‘as is’, unzipping & renaming.

Best, Dan

[Graphical user interface, text, application, email Description automatically generated]

From: Schnell, Dan @.> Date: Tuesday, December 6, 2022 at 10:19 AM To: vallotlab/ChromSCape @.>, vallotlab/ChromSCape @.> Cc: Comment @.> Subject: Re: [vallotlab/ChromSCape] "raw_mat" not defined (Issue #6) Understood--thanks very much for your quick reply! Have a nice day.

From: Pacome Prompsy @.> Date: Tuesday, December 6, 2022 at 10:18 AM To: vallotlab/ChromSCape @.> Cc: Schnell, Dan @.>, Comment @.> Subject: Re: [vallotlab/ChromSCape] "raw_mat" not defined (Issue #6)

This email originated from an EXTERNAL sender to CCHMC. Proceed with caution when replying, opening attachments, or clicking links in this message.

Hi Dan, There is currently no way to provide a metadata with the sample but this would be a very interesting to implement. If you currently want to add metadata, the best way would be to add it in the SingleCellExperiment's colData using ChromSCape functions in R directly.

Best, Pacôme

— Reply to this email directly, view it on GitHubhttps://github.com/vallotlab/ChromSCape/issues/6#issuecomment-1339540677, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJIX34P6LOGL4BVPGNVXX53WL5KN3ANCNFSM6AAAAAAQVWKQEA. You are receiving this because you commented.Message ID: @.***>

Pacomito commented 1 year ago

Hi Dan, Regarding the naming of the files, for the Sparse Matrix (10X format), each sample directory should contain :

However the exact regexp for the files are :

3 .matrix .mtx 4 .features .tsv 5 .barcodes .tsv 7 .features .txt 8 .barcodes .txt 10 .features .bed 13 .features ..gz 14 .barcodes ..gz 15 .matrix ..gz

Also, I was wrong earlier, the feature file has to be a tab-separated file and not 'chr1:100-200' or 'chr1_100_200' like format, unlike for DenseMatrix. The SparseMatrix example on the DropBox is not readable by ChromSCape so I fixed it, sorry about that.

Best, Pacome