hoffmangroup / segway

Application for semi-automated genomic annotation.
http://segway.hoffmanlab.org/
GNU General Public License v2.0
13 stars 7 forks source link

Add cnvway GMTK API to Segway #178

Closed isaiahah closed 11 months ago

EricR86 commented 11 months ago

The input_master.py should just be renamed to gmtk.py and live in the segway folder. As it stands the current proposed input_master.py would not be importable since the gmtk directory is missing a __init__.py to mark it as a module. Is there a reason for the subdirectory and/or gmtk submodule? I could be convinced that there's a good reason for it that I can't think of.

isaiahah commented 11 months ago

You previously recommended changing the import statement in Figure 1 to from segway.gmtk.input_master import ... so I created the gmtk directory as an attempt to replicate that; I forgot the __init__.py to mark it as a module. However, I didn't have any other reason for placing the file there, so unless you think of anything I will move it to segway/gmtk.py.

michaelmhoffman commented 11 months ago

Let's discuss what the file will be called

Michael M. Hoffman, PhD Senior Scientist and Chair, Computational Biology Medicine Program Princess Margaret Cancer Centre, University Health Network Associate Professor, Departments of Medical Biophysics and Computer Science University of Toronto Faculty Affiliate, Vector Institute for Artificial Intelligence

Princess Margaret Cancer Research Tower 11-311 101 College St Toronto, ON M5G 1L7

https://hoffmanlab.org/

On Tue, Sept 26 2023 at 12:53 p.m., Eric Roberts @.***> wrote:

@.**** requested changes on this pull request.

@isaiahah https://github.com/isaiahah let me know when you've addressed these comments and I'll do another quick round before passing it on.

In doc/api.rst https://github.com/hoffmangroup/segway/pull/178#discussion_r1337497173:

+

    1. A Python file describing the initial parameter settings and Segway
  • commands for training and inference. The classes provided by the GMTK API
  • for writing this file are described below. It should contain 3 sections:
    1. Code defining an :py:class:InputMaster object and setting its
  • appropriate attributes to describe the model.
    1. A line saving the :py:class:InputMaster object to an input file,
  • which describes the parameters in GMTK structure format.
    1. Code calling Segway to train the defined model and annotate using the
  • trained model. More information on running Segway is available in
  • :ref:python-interface.
  • +The CNVway code provides a worked example applying the GMTK API to defining,

Where would this be? Maybe just link to github repo?

In doc/api.rst https://github.com/hoffmangroup/segway/pull/178#discussion_r1337498699:

+

  • :param filename: Path to input master file, where results are saved
  • :type filename: str
  • :returns: None
  • :rtype: None
  • +Usage example:

  • +.. code-block:: python

  • Create InputMaster object

  • input_master = InputMaster()
  • Set parameters

  • ...
  • Save to output file

  • input_master.save(outfile)

probably rename outfile to something more filename-ish instead of a potential file handle.

In doc/api.rst https://github.com/hoffmangroup/segway/pull/178#discussion_r1337499670:

  • :param args: The covariance value which is interpreted by the Numpy array constructor.
  • :type args: array_like
  • +Usage example:

  • +.. code-block:: python

  • Create a Covar object in the InputMaster covar InlineSection

  • input_master.covar["dist1"] = Covar(1.0)
  • Alternately, a numeric value will be converted to a Covar

  • input_master.covar["dist2"] = 1.0
  • +.. py:class:: DPMF

  • A Numpy ndarray representing a DPMF with a specialized string method

Should define the acronym DPMF

In doc/api.rst https://github.com/hoffmangroup/segway/pull/178#discussion_r1337509012:

  • :returns: DPMF with given shape and uniform probabilities.
  • :rtype: DPMF
  • +Usage example:

  • +.. code-block:: python

  • Create a custom DPMF object in the InputMaster mean InlineSection

  • input_master.dpmf["biased"] = DPMF([0.7, 0.3])
  • Create a uniform DPMF with a specified shape

  • input_master.dpmf["uniform"] = DPMF.uniform_from_shape(3)
  • +.. py:class:: DiagGaussianMC

  • A Gaussian distribution with diagonal covariance. Currently the only

Is this correct? I thought it was referring to the diagonal components in a matrix. Also please define and expand any acronyms.

In doc/api.rst https://github.com/hoffmangroup/segway/pull/178#discussion_r1337509894:

  • :param components: Name or list of names of mixture components
  • :type components: str or list[str]
  • +Usage example:

  • +.. code-block:: python

  • Create a MX objects in the InputMaster mx InlineMXSection.

  • Arguments are labels for DPMF and MX objects.

  • input_master.mx["emission1"] = MX("biased", ["dist1", "dist2"])
  • input_master.mx["emission2"] = MX("biased", ["dist1", "dist2"])
  • +.. py:class:: DenseCPT

  • A Numpy ndarray representing a dense CPT with a specialized string method

Again expand the CPT acronym at least once and anywhere else in the docs for other acronyms that show up.

In doc/api.rst https://github.com/hoffmangroup/segway/pull/178#discussion_r1337523798:

+

  • Creates a :py:class:DeterministicCPT with the provided attributes.
  • :param cardinality_parents: The cardinality of parent variables
  • :type cardinality_parents: tuple[int] or tuple
  • :param cardinality: The cardinality of this variable
  • :type cardinality: int
  • :param dt: Name of an existing decision tree
  • :type dt: str
  • +Section Classes +===============

  • +Classes to store multiple objects that form one section of the parameter file. +These are used within the InputMaster object and need not be defined by

This is strange. Why are these documented then if no one who reads it is supposed to use it? I thought these would be useful in some weird unimplemented GMTK object corner cases?

— Reply to this email directly, view it on GitHub https://github.com/hoffmangroup/segway/pull/178#pullrequestreview-1644728866, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJLS27WEINIM2DFMKWDEVTX4MCAFANCNFSM6AAAAAA5GUFFXM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

isaiahah commented 11 months ago

I've renamed input_master.py as discussed in the meeting. As we discussed I've also added an Internal Classes section to the GMTK API documentation. Please review that section and let me know if I'm including too much detail, too little detail, or if there's other internal parts of the API that I missed documenting.

EricR86 commented 11 months ago

@michaelmhoffman this is ready for your review

isaiahah commented 11 months ago

CI tests are failing in the install step due to an updated Cython (3.03, from 3.02) requiring a newer version of blocs2 (2.2.8, from 2.0.0), which requires Python>=3.9 when only Python 3.8.18 is installed as part of the test. Eric is changing CI to use Python 3.9; that change will be merged in, then this branch can be merged to main.