Interface of GFA should support our decorated GFA format.

6br commented 5 years ago

Because our decorated GFA format should not include sequence anymore, we need to allow to remove sequence in GFA format. I will fix the import/export of GFA format in our browser.

ekg commented 5 years ago

We can set the sequence to be *.

6br commented 5 years ago

Outputs a stack of summarizations of the base graph, with nodes in each level above the bottom (input graph) annotated with the node ids in the previous layer that they are summarizing

Also, we need to extend node tags to be included the previous layer's node ids.

josiahseaman commented 5 years ago

I have noticed you can do this by preserving the node name as long as node name is not restricted to be a dense id. If node id and name are different you can get creative in naming.

Josiah Seaman Bioinformatics and Genome Visualization Royal Botanic Gardens Kew +44 775805 6670

On Tue, Sep 3, 2019, 23:49 Toshiyuki Yokoyama notifications@github.com wrote:

Outputs a stack of summarizations of the base graph, with nodes in each level above the bottom (input graph) annotated with the node ids in the previous layer that they are summarizing

Also, we need to extend node tags to be included the previous layer's node ids.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/graph-genome/vgbrowser/issues/29?email_source=notifications&email_token=AARG2FAXOIBBOI4ZSFCOC5LQHZ2Q5A5CNFSM4ITHA7RKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YOPPQ#issuecomment-527493054, or mute the thread https://github.com/notifications/unsubscribe-auth/AARG2FGAECJNSGO6JLDTXPDQHZ2Q5ANCNFSM4ITHA7RA .

ekg commented 5 years ago

Node names in vg-GFA are restricted to be integers, ideally dense ones.

We might need to write a converter though.

On Wed, Sep 4, 2019 at 3:52 PM Josiah Seaman notifications@github.com wrote:

I have noticed you can do this by preserving the node name as long as node name is not restricted to be a dense id. If node id and name are different you can get creative in naming.

Josiah Seaman Bioinformatics and Genome Visualization Royal Botanic Gardens Kew +44 775805 6670

On Tue, Sep 3, 2019, 23:49 Toshiyuki Yokoyama notifications@github.com wrote:

Outputs a stack of summarizations of the base graph, with nodes in each level above the bottom (input graph) annotated with the node ids in the previous layer that they are summarizing

Also, we need to extend node tags to be included the previous layer's node ids.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/graph-genome/vgbrowser/issues/29?email_source=notifications&email_token=AARG2FAXOIBBOI4ZSFCOC5LQHZ2Q5A5CNFSM4ITHA7RKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5YOPPQ#issuecomment-527493054 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AARG2FGAECJNSGO6JLDTXPDQHZ2Q5ANCNFSM4ITHA7RA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/graph-genome/vgbrowser/issues/29?email_source=notifications&email_token=AABDQEKNMUCDX5C5WE4MQE3QH5LLVA5CNFSM4ITHA7RKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD52RVLY#issuecomment-527768239, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEJK3OR6C7TH4KWJ2WDQH5LLVANCNFSM4ITHA7RA .

6br commented 5 years ago

Decorated GFA - As Data Exchange Format

Header field:

Header field should include a layer number as follows.

TAG TYPE EXAMPLE DESCRIPTION LA i LA:i:1 The layer number.

Segment fields:

Segment fields exactly have an optional field. I assume that the segment fields should have a reference to the high level nodes.

TAG TYPE EXAMPLE DESCRIPTION LN i LN:i:1024 Segment length (is used on summarized layer.) HI i HI:i:12 A reference to the higher layer nodes. HI B HI:B:i4,5,6 References to the higher layer nodes. LO B LO:B:i1,2,3 References to the lower layer nodes. SB B SB:B:i8 References to the sibling (i.e. repricated in the summarized layer) nodes.

Link fields:

Link fields are the same as the original GFA.

Path fields:

Path field needs to be represented the sort order, reference to the original path (because path names might need to be unique across all layers). Path lines are not specified to have optional tags in GFA specification, though this information should be stored in path lines themselves.

TAG TYPE EXAMPLE DESCRIPTION ID i ID:i:2 The sort order. RF i RF:Z:chr1 Reference to the original path name.

josiahseaman commented 5 years ago

Looking at the file format: GFA Format Containment line might allow us to encode summarization links. Summary nodes contain multiple nodes inside of them. It's at least worth thinking about:

C Containment line

A containment line represents an overlap between two segments where one (the Contained segment) is contained in the other (the Container segment). The Pos field stores the leftmost position of the contained segment in the container segment in its forward orientation (i.e. before this is oriented according to the ContainerOrient sign).

Example

The following line describes the containment of segment 2 in the reverse complement of segment 1, starting at position 110 of segment 1 (in its forward orientation). C 1 - 2 + 110 100M

6br commented 5 years ago

Okay, I will re-think, and I need to confirm whether VG supports Containment line currently.

josiahseaman commented 5 years ago

I just wanted to mention the option. But if we go with Matrix Visualization it will be irrelevant anyways.😥

6br commented 5 years ago

Yes, but however, I am wondering the concept of summarization layers can be usable for any other purpose.

josiahseaman commented 5 years ago

I think we should close this issue until we decide we're actually going to use something that could be called Graph Summarization. So far, ODGI bin seems to solve the same problem more simply.

graph-genome / graph_summarization