gosling-lang / gosling.js

Grammar of Scalable Linked Interactive Nucleotide Graphics
https://gosling.js.org
MIT License
164 stars 27 forks source link

the aggregate property of x and xe channels worked unexpectedly #561

Open zhangzhen opened 3 years ago

zhangzhen commented 3 years ago

I use the following spec to drag a chromosome band plot for the whole genome. Only chr19 was drawn as two separate parts. It's a bit weird.

{
  "arrangement": "vertical",
  "title": "Large Rearrangment Auditing",
  "assembly": "unknown",
  "spacing": 50,
  "xDomain": { "interval": [0, 20000] },
  "views": [
    {
      "alignment": "overlay",
      "data": {
        "url": "https://dataviz.brbiotech.com/RS20210907013FFP.panelData_.tsv",
        "type": "csv",
        "separator": "\t",
        "sampleLength": 20000,
        "genomicFields": ["index", "index2"],
        "quantitativeFields": ["CN"]
      },
      "tracks": [
        {
          "mark": "rect",
          "x": { "field": "index", "aggregate": "min", "type": "genomic" },
          "xe": { "field": "index2", "aggregate": "max", "type": "genomic" },
          "stroke": { "value": "white" },
          "strokeWidth": { "value": 2 },
          "color": {
            "field": "chr",
            "type": "nominal",
            "domain": [
              "1",
              "2",
              "3",
              "4",
              "5",
              "6",
              "7",
              "8",
              "9",
              "10",
              "11",
              "12",
              "13",
              "14",
              "15",
              "16",
              "17",
              "18",
              "19",
              "20",
              "21",
              "22"
            ],
            "range": ["#0072B2"]
          } 
        },
        {"mark": "brush", "x": { "linkingId": "detail" }}
      ],
      "width": 1000,
      "height": 30
    }
  ],
  ....
}

The red box in the screenshot that follows shows the wrongly drawn chr19. image

sehilyi commented 2 years ago

Hi @zhangzhen,

Would you be able to share the data you used (https://dataviz.brbiotech.com/RS20210907013FFP.panelData_.tsv) so that I can take a closer look at this issue?

In my example with ideograms, chr19 is displayed correctly with the aggregate property, so I wonder if this issue is related to the data you used.

For example, please refer to my example:

{
  "tracks": [
    {
      "data": {
        "url": "https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv",
        "type": "csv",
        "chromosomeField": "Chromosome",
        "genomicFields": ["chromStart", "chromEnd"]
      },
      "mark": "rect",
      "color": {
        "field": "Chromosome",
        "type": "nominal",
        "domain": [
          "chr1",
          "chr2",
          "chr3",
          "chr4",
          "chr5",
          "chr6",
          "chr7",
          "chr8",
          "chr9",
          "chr10",
          "chr11",
          "chr12",
          "chr13",
          "chr14",
          "chr15",
          "chr16",
          "chr17",
          "chr18",
          "chr19",
          "chr20",
          "chr21",
          "chr22",
          "chrX",
          "chrY"
        ],
        "range": ["#F6F6F6", "gray"]
      },
      "x": {"field": "chromStart", "type": "genomic", "aggregate": "min"},
      "xe": {"field": "chromEnd", "aggregate": "max", "type": "genomic"},
      "strokeWidth": {"value": 2},
      "stroke": {"value": "gray"},
      "style": {"outline": "white"},
      "width": 800,
      "height": 25
    }
  ]
}

Screenshot

Screen Shot 2021-10-25 at 10 29 39 AM
zhangzhen commented 2 years ago

Would you be able to share the data you used (https://dataviz.brbiotech.com/RS20210907013FFP.panelData_.tsv) so that I can take a closer look at this issue?

I will send the data file regarding only chr19 to your harvard mail.

sehilyi commented 2 years ago

@zhangzhen, thanks for sharing your file. Something I found after using your file and my above example is that chromosomes are sometimes separated depending on zoom levels when I used stroke of a rect mark.

Screen Shot 2021-11-15 at 8 06 18 PM

I think this is due to the tilling approach, i.e., chromosome 19 spans across two tiles, so min and max values are calculated two times.

Removing the use of strokes will make chromosomes visually not separated, but this will be just a workaround. Perhaps, we may not need to recommend using min and max aggregation functions for genomic fields considering that we use tiles.

sehilyi commented 2 years ago

@zhangzhen, in your case, would it be better (in terms of rendering performance) to create a tiny file for this kind of track, i.e., a file that contains the start and end position of each chromosome?