robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
69 stars 15 forks source link

Error in "load_contacts" for juicer .hic file. #225

Closed hiroyukikato911 closed 2 years ago

hiroyukikato911 commented 4 years ago

Hi,

I successfully installed GENOVA by remotes::install_github("robinweide/GENOVA")

I then required the package with library("GENOVA") I also have the following message which I usually don't encounter.

data.table 1.13.0 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com

Finally, when I try to run s1 <- load_contacts("myjuicerfile.hic", resolution = 50000, balancing =T, sample_name = "s1", colour = "black")

R always encounters a fatal error and gets aborted. myjuicerfile.hic are around 5-7GB size. My .hic file is available in all other experiments including juicebox uploading and data extracting.

Is this a problem in the memory size of my laptop?

I'm using R version 3.5.1 (64-bit) macOS High Sierra 10.13.6 16GB

Any help is appreciated. Thanks in advance.

Best,

teunbrand commented 4 years ago

Hi,

I am not sure at the moment what would be causing the error. One diagnostic test that you could perform, is trying to load a lower resolution from the .hic file (for example 1Mb or more). If this also doesn't load, there might be other problems and if this loads just fine, there indeed might be a memory issue. GENOVA reshapes the data from the .hic file to the format that the rest of the functions expect, so there is some extra memory required beyond just the raw memory required to store the data.

Best, Teun

hiroyukikato911 commented 4 years ago

Hi,

Thanks for the reply. To exclude any memory issue, I tried with pretty small test.hic file (4.8Mb) from the following website.

https://bcm.app.box.com/v/juicer-tools-testing

but still encounters same error.

I will still try to figure out what's wrong, since there seems to be pretty cool functions including pescan. By the way, when I load GENOVA package, I encounter the following message.

data.table 1.13.0 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com

Is this something normal?

Best,

teunbrand commented 4 years ago

Hi,

Could you tell us if there is a specific error message you get and if so, what the error message says?

While trying to debug the problem with the test file you've linked to, I've stumbled on a case we didn't anticipate (see #226), so that might also be causing an error.

Is this something normal?

Yes, the current master branch of GENOVA depends on data.table to be loaded, however in the development version this switches to import instead, which should prevent the printing op that message.

Best

hiroyukikato911 commented 4 years ago

Hi,

Thanks for immediate response. Attached, please find the error message I get. There is no detailed explanation. Immediately, the R gets shut down.

I appreciate your help!

Best, [image: Error message.png] [image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 20/08/19 19:21:23

2020年8月19日(水) 18:34 Teun van den Brand notifications@github.com:

Hi,

Could you tell us if there is a specific error message you get and if so, what the error message says?

While trying to debug the problem with the test file you've linked to, I've stumbled on a case we didn't anticipate (see #226 https://github.com/robinweide/GENOVA/issues/226), so that might also be causing an error.

Is this something normal?

Yes, the current master branch of GENOVA depends on data.table to be loaded, however in the development version this switches to import instead, which should prevent the printing op that message.

Best

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/robinweide/GENOVA/issues/225#issuecomment-676033711, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALE2ZC5MPRP7KF36RM6KCZTSBOMBVANCNFSM4QC2WOGQ .

hiroyukikato911 commented 4 years ago

Hi,

dev branch you have created worked absolutely fine!! Thanks a lot!!

By the way, how many chip-seq targets are intended for pescan analysis? It works great with <1000 peaks but, my R memory runs out using usual chip-seq peaks around 10000-50000.

Best,

teunbrand commented 4 years ago

Hi there,

Good to hear that it seems that the issue could be solved. For the PESCAn, it is useful to keep in mind that it looks for combinatorial effects of the peaks, so the computation scales quadratically with the number of peaks. What I would recommend if the peakset is quite large, is to either pick the top 5000 peaks (determined by signal strength or p-value), or cluster sets of peaks. An example of peak clustering is to search for sets of subsequent peaks with <10kb distance between them and consisting of at least 3-5 peaks.

Best wishes

francisfa commented 4 years ago

Hi, I get the error

Reading data... Error in eval(bysub, x, parent.frame()) : object 'chrom_x' not found

When I load .hic file. I also used the test.hic file, it had the same error.

test_juicer <- load_contacts(signal_path = './test.hic', sample_name = "test", resolution = 2500000, balancing = 'KR', # this is the default colour = "black")

So what's wrong about this error? Thank you!

asedenocacciatore commented 4 years ago

With regards to @francisfa 's problem. I have encountered this same issue now twice. In my case the same error

Reading data... Error in eval(bysub, x, parent.frame()) : object 'chrom_x' not found

appears when trying to load data from a .hic file when using R 4.0. The data can be loaded without problems on a different machine using R 3.6. Of course, there might be other differences in packages, but are not obvious based on the results of devtools::session_info(). Trying to use straw directly on the file in the machines with R 4.0 also rises the error

One of the chromosomes wasn't found in the file. Check that the chromosome name matches the genome. Error in straw::straw("KR", "path/to/file", ...)

@francisfa could you share your session info (the results of running devtools::session_info() )?

@teunbrand since for me I can load the file in my own machine I suspect this is an issue of straw with either a dependency or some other change in R 4.0. It might be more useful opening an issue on aidenlab/straw for now.

teunbrand commented 4 years ago

I can confirm that the One of the chromosomes wasn't found in the file. Check that the chromosome name matches the genome. error triggers on my windows machine with R4.0.2 when using strawr::straw nakedly. The only dependency straw seems to have is Rcpp and I think the C++ toolchain got switched/updated from R3.6 to R4.0. Due to 'working from home' conditions, I can't test R version discrepancies on my linux machine.

I can't seem to find a way to trigger the Error in eval(bysub, x, parent.frame()) : object 'chrom_x' not found error.

Dexterdandi commented 4 years ago

I get the same error Error in eval(bysub, x, parent.frame()) : object 'chrom_x' not found in both R 3.6 and 4.0 on a local machine and on a linux cluster system. I can't get it to work with multiple datasets from different species.

Does anybody have an idea?

It works fine with HiCPro output .matrix files but I don't have all datasets in this format. Also .mcool or .cool files can't be imported at all with the third option of cooler files because for some reason GENOVA is looking for .cooler extensions.

I'm stuck unless there is a way to produce .matrix and the bedfiles from .hic files that I don't know of.

Thanks for any potential insight/help

robinweide commented 3 years ago

Two possible checks that could be implemented:

robinweide commented 3 years ago

@francisfa, @asedenocacciatore , @Dexterdandi, @hiroyukikato911 : could you please update to the dev branch and try again? All my test-cases work now, so I hope yours too.

In the end it seemed that the update of strawr killed the loader, but also the constant checking for the correct packages resulted in issues.

teunbrand commented 3 years ago

@robinweide do you think this is fixed with 48bee2fb5ef2e6323fbef0625706ca2e55886189?

robinweide commented 3 years ago

It should be. If tagged people are still having issues, we'll reopen.

RomeroMatt commented 3 years ago

Hi All, I am new to GENOVA and started playing with it today. I am running into the same problem as the others in that I am getting this error: Reading data... Error in eval(bysub, x, parent.frame()) : object 'chrom_x' not found

I am using an absolute path so I don't think that is what's causing the problem. The second suggestion "fix: add argument and set to observed" - I am not sure exactly what that means in regards to input. Would you be able to explain that to a scripting newbie? Thanks! -Matt

teunbrand commented 3 years ago

Hello Matt,

Does updating to the development version of GENOVA solve the problem?

remotes::install_github("robinweide/GENOVA@dev")

Best, Teun

RomeroMatt commented 3 years ago

Hi Teun, Sorry for the delay in responding. I have used the command that you suggested and I am still having the problem. The original command I used to install GENOVA was: devtools::install_github("robinweide/GENOVA", ref = 'dev')

When using the command you suggested, R asked if I wanted to 'force' the install so I updated the command to: remotes::install_github("robinweide/GENOVA@dev", force = TRUE)

But still no luck.

baishengjun commented 3 years ago

Hi all, I get the same error, Does anybody have an idea?

teunbrand commented 3 years ago

I'll reopen this, but I cannot diagnose the issue when I have no way of reproducing the bug. If anybody has a link to a publicly accessible .hic file that gives the error, I can start having a look.

baishengjun commented 3 years ago

Hi, I change the parameter balancing='KR' to balancing=T, it works. But it takes a long time for my 2.1G .hic data withresolution=10000. Hope this could help people who get the same error.

RomeroMatt commented 3 years ago

Hi all, I tried the above suggestions and still couldn't load data with this command: H9Std_10kb_juicer <- load_contacts(signal_path = '/Users/matthewa/scripts/AWS/H9StdRun5/aligned_694/H9Std_inter_30.hic', sample_name = "H9Std_inter_30", resolution = 10000, balancing = 'KR', colour = "black")

But it does work when using balancing = T, however, not for my higher resolution data - works for 100kb, but not for 10kb. H9Std_10kb_juicer <- load_contacts(signal_path = '/Users/matthewa/scripts/AWS/H9StdRun5/aligned_694/H9Std_inter_30.hic', sample_name = "H9Std_inter_30", resolution = 100000, balancing = T, colour = "black")

When trying to load 10kb data I receive this message: "Error: vector memory exhausted (limit reached?)"

I also have been using hicexplorer and have converted my .hic files too .cool files so maybe I can try that and see if it works any better.

Thanks for the help! -Matt

baishengjun commented 3 years ago

Hi, Matt It seems that your RAM is not enough

Landau1994 commented 3 years ago

Another reason is you don't use centromeres parameter

teunbrand commented 2 years ago

I finally managed to reproduce this bug with a dataset of a colleague and I've submitted a fix to the dev version. Unfortunately, I can't do anything about the RAM issue: it just is quite costly to hold a HiC dataset in memory.

RomeroMatt commented 2 years ago

Thanks so much for the help! I'll use our lab computer with higher ram and download the new Dev version. Huge help!