hyanwong / giglib

MIT License
4 stars 2 forks source link

Add the graph object #46

Closed hyanwong closed 10 months ago

hyanwong commented 10 months ago

To represent a immutable, valid GIG. This also includes a method to "sample resolve" a GIG, although it's not extensively tested yet.

@duncanMR - after writing the code, I think that it's easier to have child_left < child_right for inversions (rather than the same for parents), because in my sample_resolve code, I check the left and right in the child against the existing intervals as we go up the graph. See what you think?

There's a lot of extra functionality in this PR, sorry, but I think it's useful. I've tried to stick to the tskit style as much as possible, apart from the following:

gig.edges  # can iterate over, like ts.edges() but without the braces
gig.edges[0]  # same as ts.edge(0)
gig.nodes  # like ts.nodes()
gig.node[0]  # like ts.node(0)
gig.sequence_length(u)  # has to be per node: find a max sequence length for node u, using the edges

Summary by CodeRabbit

coderabbitai[bot] commented 10 months ago

[!WARNING]

Rate Limit Exceeded

@hyanwong has exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 11 minutes and 10 seconds before requesting another review.

How to resolve this issue? After the wait time has elapsed, a review can be triggered using the @coderabbitai review command. Alternatively, push new commits to this PR. We recommend that you space out your commits to avoid hitting the rate limit.
How do rate limits work? CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our [FAQ](https://coderabbit.ai/docs/faq) for further information.
Commits Files that changed from the base of the PR and between baece7f2be2962a540033806ad21c531691555c2 and 1f82ee77e2bd02e82757fea5673bc9010aefd8f6.

Walkthrough

The GeneticInheritanceGraph library has undergone a significant update, introducing new imports, classes, and methods across various modules. These changes enhance the library's ability to represent and manipulate genetic inheritance graphs. Additionally, the update includes a naming convention shift from gig to gigl, reflected across import statements and test cases, indicating a move towards a more efficient and feature-rich library.

Changes

File Path Summary of Changes
.../GeneticInheritanceGraph/__init__.py Added imports for from_tree_sequence and Graph from the graph module.
.../GeneticInheritanceGraph/graph.py Introduced Graph class, from_tree_sequence function, and data classes IEdge and Node.
.../GeneticInheritanceGraph/tables.py Enhanced table row classes with @dataclass, added new methods to BaseTable and Tables.
tests/conftest.py
tests/gigutil.py
tests/test_graph.py
tests/test_tables.py
tests/test_util.py
Renamed module import to gigl, updated fixtures, and revised test cases to reflect the new naming convention.

🐇✨ To code anew, the rabbit hops,
Through graphs and nodes, it never stops.
With changes wrought, the tests align,
Inheritance graphs, now more divine.
🌟🌿

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ?


Tips ### Chat with CodeRabbit Bot (`@coderabbitai`) - You can reply to a review comment made by CodeRabbit. - You can tag CodeRabbit on specific lines of code or files in the PR by tagging `@coderabbitai` in a comment. - You can tag `@coderabbitai` in a PR comment and ask one-off questions about the PR and the codebase. Use quoted replies to pass the context for follow-up questions. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger a review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - The JSON schema for the configuration file is available [here](https://coderabbit.ai/integrations/coderabbit-overrides.v2.json). - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json`
hyanwong commented 10 months ago

I'm please that the sample_resolve algorithm is so simple (this is the core of the simplify method), although it needs a lot more testing. I did have to use an interval library though. I does correctly sample resolve the sv gig (https://github.com/hyanwong/GeneticInheritanceGraph/issues/2#issuecomment-1858383674)

codecov-commenter commented 10 months ago

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (cceee86) 89.30% compared to head (331bff1) 92.24%. Report is 1 commits behind head on main.

Files Patch % Lines
GeneticInheritanceGraph/graph.py 95.03% 3 Missing and 5 partials :warning:
GeneticInheritanceGraph/tables.py 97.72% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #46 +/- ## ========================================== + Coverage 89.30% 92.24% +2.94% ========================================== Files 4 5 +1 Lines 187 387 +200 Branches 35 81 +46 ========================================== + Hits 167 357 +190 - Misses 5 11 +6 - Partials 15 19 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

duncanMR commented 10 months ago

LGTM! I haven't analysed sample_resolve in-depth but I'll get to that in future

hyanwong commented 10 months ago

LGTM! I haven't analysed sample_resolve in-depth but I'll get to that in future

Thanks. That routine almost certainly needs more testing, but once you've had a look over it, I guess I should merge, and we can add tests later?

hyanwong commented 10 months ago

Note that the sample_resolve function (including the comment) is largely from https://github.com/tskit-dev/tskit/discussions/2869#discussioncomment-7553044

duncanMR commented 10 months ago

I cooked up a quick example with a deletion and a subtree without samples, and it behaved exactly as expected:

image

import GeneticInheritanceGraph as gigl
import tests.gigutil as gigutil

    # p | c | c_left | c_right | p_left | p_right
interval_data = [
    (10, 8, 0, 3, 0, 3),
    (10, 9, 0, 5, 5, 0),
    (8, 4, 0, 3, 0, 3),
    (8, 4, 3, 6, 0, 3),
    (9, 6, 0, 5, 0, 5),
    (9, 7, 0, 5, 0, 5),
    (4, 0, 0, 6, 0, 6),
    (4, 1, 0, 6, 0, 6),
    (5, 2, 0, 3, 0, 3),
    (5, 3, 0, 3, 0, 3)
]
# time | flags
node_data = [
    (0, 1),
    (0, 1),
    (0, 1),
    (0, 1),
    (1, 0),
    (1, 0),
    (1, 0),
    (1, 0),
    (2, 0),
    (2, 0),
    (4, 0)
]

tables = gigl.Tables()
tables.intervals = gigutil.make_iedges_table(interval_data, tables)
tables.nodes = gigutil.make_nodes_table(node_data, tables)
tables.sort()
gig = gigl.Graph(tables)
resolved_gig = gig.sample_resolve()
print(gig.tables)
print(resolved_gig.tables)

I agree that we should merge it. It seems to work fine for simple examples; once we have a forward simulator, we'll be able to write more rigorous tests.

hyanwong commented 10 months ago

Great, thanks a lot for testing. I'll merge this now.