gosling-lang / gosling.js

Grammar of Scalable Linked Interactive Nucleotide Graphics
https://gosling.js.org
MIT License
160 stars 27 forks source link

The sequence abstraction is wanted #492

Open zhangzhen opened 3 years ago

zhangzhen commented 3 years ago

Sehi L'Yi:

The sequence abstraction is not supported at the moment. For now, a workaround would be to manually specify intron regions and juxtapose multiple views for those regions, but this will work only if you are interested in visualizing a small number of intron regions.

Could you please show me an example for the workaround?

Zhen

sehilyi commented 3 years ago

Hi @zhangzhen! Welcome to a Github issue.

I was referring to something like the following example:

Screen Shot 2021-08-31 at 11 01 24 AM

where the view on the top (named "Original") is a regular view while on the bottom there are two views horizontally juxtaposed with specific regions specified.

{  // two views on the bottom
   "arrangement": "horizontal",
   "spacing": 0,
   "views": [
      {
         "xDomain": {"chromosome": "3", "interval": [52047400, 52056500]}, // first region
         ...,
      },
      {
         "xDomain": {"chromosome": "3", "interval": [52070500, 52155000]}, // second region
         ...,
      }
   ]
}

But, again, this would be useful only if you are interested in seeing only a small number of fixed regions since specifying all those xDomains are laborious. Also, zooming and panning will not be useful in this case since each view has its own navigation.

Raw JSON Spec ```js { "spacing": 30, "style": {"enableSmoothPath": true, "outline": "white"}, "views": [ { "xDomain": {"chromosome": "3", "interval": [52045000, 52155000]}, "tracks": [ { "title": "Original", "template": "gene", "data": { "url": "https://server.gosling-lang.org/api/v1/tileset_info/?d=gene-annotation", "type": "beddb", "genomicFields": [ {"index": 1, "name": "start"}, {"index": 2, "name": "end"} ], "valueFields": [ {"index": 5, "name": "strand", "type": "nominal"}, {"index": 3, "name": "name", "type": "nominal"} ], "exonIntervalFields": [ {"index": 12, "name": "start"}, {"index": 13, "name": "end"} ] }, "encoding": { "startPosition": {"field": "start"}, "endPosition": {"field": "end"}, "strandColor": {"field": "strand", "range": ["gray"]}, "strandRow": {"field": "strand"}, "opacity": {"value": 0.4}, "geneHeight": {"value": 30}, "geneLabel": {"field": "name"}, "geneLabelFontSize": {"value": 30}, "geneLabelColor": {"field": "strand", "range": ["gray"]}, "geneLabelStroke": {"value": "white"}, "geneLabelStrokeThickness": {"value": 4}, "geneLabelOpacity": {"value": 1}, "type": {"field": "type"} }, "width": 800, "height": 100 } ] }, { "arrangement": "horizontal", "spacing": 0, "views": [ { "xDomain": {"chromosome": "3", "interval": [52047400, 52056500]}, "tracks": [ { "title": "Two views concatenated", "template": "gene", "data": { "url": "https://server.gosling-lang.org/api/v1/tileset_info/?d=gene-annotation", "type": "beddb", "genomicFields": [ {"index": 1, "name": "start"}, {"index": 2, "name": "end"} ], "valueFields": [ {"index": 5, "name": "strand", "type": "nominal"}, {"index": 3, "name": "name", "type": "nominal"} ], "exonIntervalFields": [ {"index": 12, "name": "start"}, {"index": 13, "name": "end"} ] }, "encoding": { "startPosition": {"field": "start"}, "endPosition": {"field": "end"}, "strandColor": {"field": "strand", "range": ["gray"]}, "strandRow": {"field": "strand"}, "opacity": {"value": 0.4}, "geneHeight": {"value": 30}, "geneLabel": {"field": "name"}, "geneLabelFontSize": {"value": 30}, "geneLabelColor": {"field": "strand", "range": ["gray"]}, "geneLabelStroke": {"value": "white"}, "geneLabelStrokeThickness": {"value": 4}, "geneLabelOpacity": {"value": 1}, "type": {"field": "type"} }, "width": 200, "height": 100 } ] }, { "xDomain": {"chromosome": "3", "interval": [52070500, 52155000]}, "tracks": [ { "template": "gene", "data": { "url": "https://server.gosling-lang.org/api/v1/tileset_info/?d=gene-annotation", "type": "beddb", "genomicFields": [ {"index": 1, "name": "start"}, {"index": 2, "name": "end"} ], "valueFields": [ {"index": 5, "name": "strand", "type": "nominal"}, {"index": 3, "name": "name", "type": "nominal"} ], "exonIntervalFields": [ {"index": 12, "name": "start"}, {"index": 13, "name": "end"} ] }, "encoding": { "startPosition": {"field": "start"}, "endPosition": {"field": "end"}, "strandColor": {"field": "strand", "range": ["gray"]}, "strandRow": {"field": "strand"}, "opacity": {"value": 0.4}, "geneHeight": {"value": 30}, "geneLabel": {"field": "name"}, "geneLabelFontSize": {"value": 30}, "geneLabelColor": {"field": "strand", "range": ["gray"]}, "geneLabelStroke": {"value": "white"}, "geneLabelStrokeThickness": {"value": 4}, "geneLabelOpacity": {"value": 1}, "type": {"field": "type"} }, "width": 600, "height": 100 } ] } ] } ] } ```
sehilyi commented 3 years ago

We will support the sequence abstraction eventually but would need some time since supporting it will require us to restructure data models.

Could you share (or describe) an actual visualization example that you are thinking of and/or the sample data files? If possible, this will be really helpful for us to make sure to cover the real-world use cases when supporting the sequence abstraction.

zhangzhen commented 3 years ago

@sehilyi Sorry for my late reply. The following figure shows the copy number of exons in the BRIP1 gene for a sample to be tested. The intron regions have been hidden. Panning and Zooming is not necessary in this case, so I find it ok that the figure is drawn as a static image.

image

BTW, I don't want to add the two following css links because I want to use tailwind css instead. Is it ok?

<head>
  ...
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
  <link rel="stylesheet" href="https://unpkg.com/higlass@1.11.3/dist/hglib.css">
</head>
sehilyi commented 3 years ago

Yes, if you are interested in showing only one gene, I think you can try with my suggestion above. But, you would need to fine-tune the width and xDomain of individual exons (views) to look like the figure you shared.

Regarding CSS, Gosling currently has a dependency on HiGlass styles. So, you would need to keep the higlass CSS file:

<link rel="stylesheet" href="https://unpkg.com/higlass@1.11.3/dist/hglib.css">

Since this CSS file in general uses unique index names (e.g., HiGlass-module_higlass-scroll-container-overflow-2FS0w), loading this CSS file does not make any conflicts with your CSS definitions.

We also plan to remove the CSS dependency (#433) in the future.

zhangzhen commented 3 years ago

We will support the sequence abstraction eventually but would need some time since supporting it will require us to restructure data models.

Does Higlass support the sequence abstraction? After trying Higlass a bit, I find that Higlass is more configurable and flexible, which fits the needs of my leader better. I'm more likely to switch from Gosling to Higlass.

sehilyi commented 3 years ago

HiGlass does not support the abstraction as well.

Would you be able to share in which part you specifically considered that HiGlass is more flexible and configurable and Gosling is not? Gosling is designed to be more expressive in terms of visual encoding by using the grammar of graphics, so we woud like to learn from end users like you what could be the issues when using Gosling.

zhangzhen commented 3 years ago

Would you be able to share in which part you specifically considered that HiGlass is more flexible and configurable and Gosling is not?

If I understand Gosling correctly, Gosling aims to make genomic data easier to visualize in the vega-lite's way which I find brilliant. In addition, track template is also a wonderful thing, which presents a means to build configurable and reusable tracks.

Let me now turn to Higlass. If editable can be set to true in a Higlass View Config, then the view and its tracks config can be changed by end users at wills. I'm not sure if Gosling can also do that. New track plugins can be developed to meet individual requirements by guys like me who has more than 15 years programming experience.

The sequence abstraction is a "must-have" in my case. When applying the workaround you suggested, we run into two issues as follows:

  1. The same data has to be fetched in each view. Because the data declared on the same level as "views" can't be transferred down to any view underneath. This will definitely cause a performance issue. For example, since BRCA1 has 23 exons, the data needs to be fetched 23 times in series.
    {
    "data": { ... },
    "views": [
    # view 1
    { ... },
    # view 2
    { ... },
    ...
    }]
    }
  2. In some views, their width is smaller than minWidth causing an error reported in devtools. The width of views is proportional to the length of exons.

Would you give me some hints about another way to implement the sequence abstraction? Because it will take you much time to restructure the data models to implement it. Unfortunately I'm not allowed to wait that long for this to release.

sehilyi commented 3 years ago

If editable can be set to true in a Higlass View Config, then the view and its tracks config can be changed by end users at wills.

Ah, I see your point, and it makes a lot of sense. View configs themselves are more expressive in Gosling, but we do not currently support the GUI-based configuration as in HiGlass at the moment which can be the hurdle for using Gosling.

The same data has to be fetched in each view. Because the data declared on the same level as "views" can't be transferred down to any view underneath. This will definitely cause a performance issue. For example, since BRCA1 has 23 exons, the data needs to be fetched 23 times in series.

You are right, and this will be another major disadvantage of using the workaround that I suggested.

Would you give me some hints about another way to implement the sequence abstraction?

Not sure if there is a better workaround for the sequence abstraction either in Gosling or HiGlass at the moment, but if you can share more specifics about your use case by answering the questions below, I might be able to come up with an approach.

zhangzhen commented 3 years ago
  • What data formats are you using (e.g., bigwig)?

I'm using csv format.

  • Would you be able to share a sample data file for this? (so that I could test as well)

Sure, but I don't want to share here. I will send the file to you per email. Is it ok for you?

  • Are the sequence already abstracted in the data (i.e., are the genomic positions absolute or relative)?

Genomic positions are absolute on a chromosome, e.g. chr17:41,223,074.

  • How many genes are you interested in visualizing?

There are 108 genes in total that I'm interested in.

  • Do you need to use zoom and panning?

Yes, I need.

I've come up with my own workaround. The main idea is that genomic positions in the raw data are transformed to genomic positions in the synthetic chromosome 1 before gosling loads them. Each region is represented as a pair of key and value, e.g. (start, end) => total_length_before, where start and end denotes the start and end genomic positions of the region respectively, and total_length_before the length of all regions before it. All regions are saved in a interval tree. Given a genomic position x, find the region t (s1, e1) that it belongs to, and the transformed position is total_length_before + x - s1.

BTW, what I really want is an overview + detail view that is illustrated using the following two figures. Figure 1 represents the overview and Figure 2 the detail view of a gene that covers exons only. The overview is static, while the detail view is not. When Mouse hovers on points in the overview, all points of a gene are highlighted. A brush is used to link them. I'm not sure if gosling support this kind of interaction? Besides, can the light-grey shadow around the horizontal red line in Figure 1 be drawn using the mark "area"?

Figure 1: image Figure 2: image

sehilyi commented 3 years ago

Sure, but I don't want to share here. I will send the file to you per email. Is it ok for you?

Yes, email works for me :)

I've come up with my own workaround. The main idea is that genomic positions in the raw data are transformed to genomic positions in the synthetic chromosome 1 before gosling loads them. Each region is represented as a pair of key and value, e.g. (start, end) => total_length_before, where start and end denotes the start and end genomic positions of the region respectively, and total_length_before the length of all regions before it. All regions are saved in a interval tree. Given a genomic position x, find the region t (s1, e1) that it belongs to, and the transformed position is total_length_before + x - s1.

That sounds like a good alternative! I wonder how did you manage to show the x-axis labels in this approach? Probably, you can put numbers like in the figure (1, 2, 3, ..., n) using text mark?

When Mouse hovers on points in the overview, all points of a gene are highlighted. A brush is used to link them. I'm not sure if gosling support this kind of interaction?

Not officially supported yet, but you could use a recently added experimental property (i.e., experimental: { reactive: true}) and mouseover API.

<GoslingComponent 
   ... 
   experimental={reactive: true} // this enables to update certain encoding that is changed in the spec (e.g., color of points) instead of rerendering the entire view
/>

This works only if you set an id to a track (i.e., views: [{ tracks: [{ id: 'track-to-cache', ... }]}]).

https://github.com/gosling-lang/gosling.js/blob/6501cd81e1321444113409c2bf8b517b4b959967/src/core/gosling.schema.ts#L88

I can come up with a complete example for this later.

Then, use a mouseover listener in the Gosling API to capture the mouse event and change spec accordingly to update colors:

 gosRef.current.api.subscribe('mouseover', (type: string, e: CommonEventData) => {
      const gene = e.data['gene']; // get information from the visual element
      ... // logic to change specs
 });

Besides, can the light-grey shadow around the horizontal red line in Figure 1 be drawn using the mark "area"?

This is not supported at the moment but could be implemented relatively quickly. In your case, do you have two data columns (i.e., start and end positions along the y-axis) or just one column (i.e., the vertical length of the area)?

zhangzhen commented 3 years ago

That sounds like a good alternative! I wonder how did you manage to show the x-axis labels in this approach?

I disabled the x-axis labels by just setting the x-axis to none.

Probably, you can put numbers like in the figure (1, 2, 3, ..., n) using text mark?

I used a text mark to achieve this. Please see the following figure: image BTW, the domain of y-axis was set to from 0 to 5.5. The showing of 5.5 at the top-left corner looks uncompleted. How can this be solved?

In your case, do you have two data columns (i.e., start and end positions along the y-axis) or just one column (i.e., the vertical length of the area)?

I have two columns, namely mean and sd, and calculate the upper- and lower-value by using the formula mean +/- 3*sd.

Not officially supported yet, but you could use a recently added experimental property (i.e., experimental: { reactive: true}) and mouseover API.

Undoubtedly, the mouseover API is useful for my case.

sehilyi commented 3 years ago

BTW, the domain of y-axis was set to from 0 to 5.5. The showing of 5.5 at the top-left corner looks uncompleted. How can this be solved?

I think I will need to make sure that the top-most label of the y-axis is not occluded by offsetting its position. As a workaround, you could set a higher number than 5.5 (e.g., domain: [0, 6]).

sehilyi commented 2 years ago

I unintentionally closed this from #518.

@zhangzhen, you can encode both y and ye with bar marks to create the graph on the background of your Figure 1. This will be available in the next release. Let me know if this works in your case. Please refer to #518 for the example.

zhangzhen commented 2 years ago

Not officially supported yet, but you could use a recently added experimental property (i.e., experimental: { reactive: true}) and mouseover API. I can come up with a complete example for this later.

A complete example is definitely desirable, otherwise I don't know how to change specs in the mouseover handler.