jamiewaese / ePlant

ePlant is a data visualization tool for integrating and exploring multiple levels of biological data.
MIT License
2 stars 1 forks source link

Heat map gene density on chromosomes #12

Closed jamiewaese closed 11 years ago

jamiewaese commented 11 years ago

As discussed.

yuzhenmi commented 11 years ago

I've came up with a new algorithm for calculating the gene densities with pixel resolution each frame. It has an order of magnitude less time complexity (O(n) from O(n2)) and my computer is able to run it smoothly, though I can hear my CPU fans working hard.

If you're interested, I am using the following procedure to calculate the densities:

  1. Determine the number of pixels occupied by the chromosome vertically on the screen and create an array of bins with size as this number, and initialize the array with zeros. For example, chromosome occupies 1000 pixels, create array with size 1000 and zero for initial values.
  2. Calculate the number of base pairs represented by a single pixel (a floating number for precision) and use this as the size for each bin (by bin, I mean the definition from statistics).
  3. For each gene, determine the range of bin indices that the gene covers. For example, a gene has a start position of 9000 and an end position of 11000. The number of base pairs per pixel is 10,000. The range of indices is from Int(9000 / 10000) = 0 to Int(11000 / 10000) = 1.
  4. Continuing for each gene, add one to all bins such that start index <= bin index <= end index. For the example, one would be added to bins[0] and bins[1].
  5. The bins now contain numbers of genes in the bin's range. Normalize the bin values with some threshold (I am using a simple relative threshold for now) and draw lines along the vertical axis of the chromosome with grey color determined by the bin value.

I am drawing the heat map as an overlay on top of the chromosome. Let me know how it performs!