kingsfordgroup / armatus

BSD 2-Clause "Simplified" License
25 stars 10 forks source link

Question about armatus input matrix #1

Closed oursu closed 9 years ago

oursu commented 9 years ago

Hi,

I have two quick questions about your armatus software, which I am excited to use.

  1. Is the input matrix supposed to have sorted entries? (i.e. are the fragments supposed to be in sorted order according to their genomic location?)
  2. Must the input matrix span the whole genome, or is it ok to only specify a submatrix, leaving out fragments that do not have any read covering them?

Thanks, Oana

lynxoid commented 9 years ago

Hi Oana,

  1. We adopted the format originally used in 3C matrices produced by Ren's lab (http://chromosome.sdsc.edu/mouse/hi-c/download.html, although their format changed recently), which is a matrix with every row representing a single fragment and its frequencies of interactions with all other fragments, tab-separated, e.g. for a 40-kb resolution:
chr19  0  40000  7  0   2  0 ...
chr19  40000  80000  10  0   1  0 ...
chr19  80000  120000  8  0  1  0  ....

where each row's format is: chromosome, fragment start and end, list of frequencies in sorted order by their genomic location.

  1. We do not directly support submatrices, but you could extract the submatrix and feed it into Armatus by itself (must have >100 fragments), then convert back into original fragments.

Thanks, Darya

oursu commented 9 years ago

Great, thank you Darya!

oursu commented 9 years ago

Hi,

Does the input matrix have to specified in terms of a fixed window size? I am asking this because I tried running Armatus with restriction fragment resolution, rather than a fixed window size, and in the results most domains have a size = resolution that Armatus outputs at the start of the run.

Thanks, Oana

lynxoid commented 9 years ago

Oana,

Yes, the elements in the matrix have to be evenly distributed (i.e. fixed window size apart). It is an implicit assumption in the dynamic program that finds domains.

Thanks!