kingsfordgroup / armatus

BSD 2-Clause "Simplified" License
25 stars 10 forks source link

empty output #12

Open samuelCollombet opened 6 years ago

samuelCollombet commented 6 years ago

Hi, I am trying to run armatus on a sparse matrix, but I seems to get no domains at all, trying with different matrix... I believe there is a problem with my run. Could i send you my matrix so you can test it, and could I gt an example of a sparse matrix on which you have tested armatus to see if the problem come from my installation?

Thanks, Samuel

nsauerwald commented 6 years ago

Hi Samuel, Please send me your matrix (nsauerwald@cmu.edu), and I will try to figure out the issue. In the meantime, any of the sparse Hi-C matrices from the 2012 Dixon publication (available here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35156), or from the 2014 Rao publication (available here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525) should run correctly with Armatus (make sure to use the -R flag for the Rao dataset, as the file format is different). Best, Natalie

hgu0717 commented 5 years ago

Hey,

I've got the same problem. Just to be clear, is the input for Rao's dataset the sparse matrix or domain list? I would think the sparse matrix (3 columns) make sense to me, however, the introduction of Armatus seems to suggest to use domain list with -R?

nsauerwald commented 5 years ago

The input should be the sparse matrix, as you said. If you use the -R flag, the software will automatically look for both the ".RAWobserved" and the ".KRnorm" files, and normalize the Hi-C data before finding TADs. If you just want to run Armatus on the raw Rao data, use the -N flag to skip the normalization step. For example, if you wanted to find TADs on the normalized 5kb Hi-C matrix, on chromosome 1 of the GM12878 data, use "-R -i GM12878_combined/5kb_resolution_intrachromosomal/chr1/MAPQGE30/chr1_5kb".

hgu0717 commented 5 years ago

Thanks, Natalia.

Here is another basic question for you, if my data is in sparse matrix, does the matrix is supposed to be normalized or not?

Thanks.

nsauerwald commented 5 years ago

Armatus will run on either normalized or unnormalized data, so the choice of whether to normalize (and which normalization method to use) is up to you based on your application.

agolicz commented 5 years ago

Hi, I seem to be having a similar problem. The sizes of all my output files are zero. I am using an iced matrix produced by HiC-Pro and split by chromosomes using it's utility script split_sparse.py. My first guess would be that there is some problem with formatting of the input.

Any chance you could help?

I've put some sample files on OSF: https://osf.io/xh4rt/files/

All the best, Agnieszka

This is the command: armatus -m -r 5000 -N -S -c ${i} -i $INPUT -g 1.0 -s 0.05 -o armatus.domains/chr${i}

And that's the head of the input file: head NP1_5000_iced_Chr1.matrix 1 1 88.000000 1 2 84.000000 1 3 55.000000 1 4 28.000000 1 5 30.000000 1 6 22.000000 1 7 19.000000 1 8 7.000000 1 9 12.000000 1 10 4.000000

Some lines from std.out

Reading input from NP1_5000_iced_Chr10.matrix. Building matrix for chromosome 10 at resolution 5000bp with 1 rows. Initializing matrix to zero elements 10.9114% 21.8229% 32.7343% 43.6457% 54.5572% 65.4686% 76.38% 87.2915% 98.2029% MatrixParser read matrix of size: 1 x 1 gamma=0 OPTIMAL SCORE: 0 begin computeTopK() In topK() The 0th-best solution had score 0 gamma=0.05 OPTIMAL SCORE: 0 begin computeTopK() In topK()

nsauerwald commented 5 years ago

I think the issue is that your data is already divided by the resolution (notice that your TADs start and end at values like 1 and 2 instead of 5000 and 10000), and when you put in that the resolution of your data is 5000, the algorithm tries to divide the TAD boundaries by the resolution, and gets values like 1/5000 for your boundaries. Try using the same command but without the "-r 5000" flag, as shown below.

armatus -m -N -S -c ${i} -i $INPUT -g 1.0 -s 0.05 -o armatus.domains/chr${i}