Open rajib100bd opened 1 year ago
Hi, thank you for your message.
I tried to reproduce the error with a Cooler file of my own and it worked fine. Were there any errors during the first compartments
command?
Can you plot the compartment matrix using fancplot
, or are you getting the same error?
fancplot chr18:40.5mb-41mb -p square /home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb.ab -vmin -1 -vmax 1 --colormap bwr
Usually the File type not recognised
error indicates that the file got corrupted somehow.
I thought I'd also mention that 5kb is very small for a compartment analysis. Generally, in 5kb matrices signal will get very noisy away from the diagonal (unless you have enormous sequencing depth), which results in low correlations overall. I would recommend much larger bin sizes - depending on your sequencing depth anywhere between 100kb and 1mb.
Hi there, Thank you for your prompt reply. I just checked fancplot code and it returned me the same message.
ValueError: File type not recognised (/home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb.ab).
Do you suggest me to regenerate the correlation matrix again?
I found the dataset at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE122958. I'll try to run the compartments calculation myself and keep you updated here.
To be honest, you might simply be running out of memory, as the complete matrix for each chromosome will be loaded for correlation analysis.
Thank you for your reply. Just for an update:
I recalculated the correlation matrix again overnight. This time after running fancplot command:
fancplot -o /home/main/sa/HiC_data_analysis/Jurkat_wt/jurkat_chr18_5kb.ab.png chr18 -p square /home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb.ab -vmin -0.75 -vmax 0.75 -c RdBu_r
However, I didn't get any error message, instead, it returned me an empty plot.
Another quick question, can I make a chromosome-specific correlation matrix file instead of calculating the matrix file for the whole genome? I need only several chromosome information for my analysis. That would save me some time.
Sorry for the naïve questions. Thank you for your kind help!
I calculated the AB correlation matrix on GSE122958_jurkat_wt_hg19_5k_q10
and everything worked as expected:
I'm not sure what went wrong in your run - are you sure the command finished completely? On my machine, the calculation took much longer than one night.
Regarding the chromosome-specific correlation matrix: you can use the -x
parameter to exclude all chromosome you don't need:
-x EXCLUDE [EXCLUDE ...], --enrichment-exclude EXCLUDE [EXCLUDE ...]
Chromosome names to exclude from AB compartment and enrichment profile calculation
Finally, I'd like to repeat that a correlation matrix at 5kb resolution is probably not what you want. As you can see in the plot above, the values are all close to 0, which is due to the sparsity and randomness of signal away from the diagonal. My strong recommendation is to use a bin size of 100kb or ideally even larger.
Hi there, Thank you for your reply. I rechecked the run. It seems it's a memory issue. Let me try with a chromosome-specific correlation matrix. I hope the run will be completed before the memory runs out! Thank you for the tips regarding the 5kb bin size. I will try with 100kb. I'll update the result. Thanks again!
Hi there, I just finished the analysis by converting the cooler file to 1mb resolution. Finally, I could calculate the compartment. It seemed I used up all of the computation memory at 5kb resolution. Thank you for your kind cooperation! Cheers!
Dear team, I am new to HiC data analysis and trying out fanc command line tools to analyse HiC data set downloaded from SRA database as a .cool file.
The first command I used worked well, generating a correlation matrix. I used the following code:
fanc compartments /home/main/sa/HiC_data_analysis/Jurkat_wt/GSE122958_jurkat_wt_hg19_5k_q10.cool /home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb.ab
Then, in the following command, I passed the code below to generate AB Eigenvector:
fanc compartments -g /mnt/nas/reference_genome/BWA/mammals/hg19/genome.fa -v /home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb_ev-gc.txt /home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb.ab
But, in return, I got the following message:
Traceback (most recent call last): File "/home/main/sa/.local/bin/fanc", line 127, in <module> Fanc() File "/home/main/sa/.local/bin/fanc", line 93, in __init__ command([sys.argv[0]] + sys.argv[option_ix:], log_level=log_level, verbosity=verbosity) File "/home/main/sa/.local/lib/python3.9/site-packages/fanc/commands/fanc_commands.py", line 4122, in compartments matrix = fanc.load(input_file, tmpdir=tmp) File "/home/main/sa/.local/lib/python3.9/site-packages/fanc/tools/load.py", line 90, in load return gr_load(file_name, *args, **kwargs) File "/home/main/sa/.local/lib/python3.9/site-packages/genomic_regions/regions.py", line 195, in load raise ValueError("File type not recognised ({}).".format(file_name)) ValueError: File type not recognised (/home/main/sa/HiC_data_analysis/Jurkat_wt/Jurkar_wt_5kb.ab).
Could you please tell me what might be wrong here? FYI, I am using FAN-C version: 0.9.25. Thank you for your kind help!