hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

emission probability heatmap #167

Open YichaoOU opened 1 year ago

YichaoOU commented 1 year ago

Hello,

How can I create the emission probability heatmap as shown in the ENCODE presentation?

https://wiki.uiowa.edu/download/attachments/101234675/Segway-UIowa-45-public.pdf?version=1&modificationDate=1375477976147&api=v2

image

EricR86 commented 1 year ago

To create this figure, Segtools was used. Namely the tools segtools-gmtk-parameters. The figure is generated from the learned model parameters found after training inside the train directory (params/params.params).

Currently it's recommended to install segtools through Bioconda. Let me know if you need any further assistance.

YichaoOU commented 1 year ago

Thanks! Is row min =0 and row max=1, and they are emission probabilities?

image

EricR86 commented 1 year ago

@YichaoOU the values across the row are normalized but they are not in themselves probabilities. For each row (track), this diagram shows which labels have a higher Gaussian mean distribution associated with each label. The black bars indicate the variance each underlying Gaussian has.

YichaoOU commented 1 year ago

Thanks! Do you know how I can directly plot emission probabilities? There are other plots from the segtools-gmtk-parameters command, but only the above figure looks like probabilities.

Is the table gmtk_parameters.stats.csv below the emission probabilities?

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | DNA_rep1 | DNA_rep2 | RNA_rep1 | RNA_rep2 -- | -- | -- | -- | -- 0 | 0.591853 | 0.605576 | 0.630044 | 0.645321 1 | 0 | 0 | 0 | 0 2 | 0.273725 | 0.298312 | 0.313079 | 0.348844 3 | 1 | 1 | 1 | 1

Thanks, Yichao

EricR86 commented 1 year ago

The parameters above are specifically just the learned Gaussian means from EM training. If you want all the parameters to plot a Gaussian, the covariance needs to be obtained from the final params.params in your train directory. The covariance is tied across each track.

It is important to note that by default, all data input to Segway is normalized with an arcsinh transformation (there are options to change this). So the Gaussian means and covariance reflect these transformed values.

In params.params for the means (outside of the csv you posted above) you would look under the heading: % means And for the covariences under: % diagonal covariance matrices