merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

Improving variability display on the charts page #298

Closed meren closed 8 years ago

meren commented 8 years ago

Dear @paczian,

Do you remember the charts page in which you masterfully implemented a D3 trick so we have bootstrap pop up windows to show the annotation of genes? I need your help with one more thing regarding that part of the code :)

The file of interest is this one:

anvio/data/interactive/js/charts.js

Previously for every sample there was a single array of variability scores, so there were as many variability arrays as there were layers. We created chart objects with that information like this:

        charts.push(new Chart({
                        name: layers[layer_index],
                        coverage: coverage[layer_index],
                        variability: variability[layer_index],
                        competing_nucleotides: competing_nucleotides[layer_index],
                        id: j++,
                        width: width,
                        height: chartHeight,
                        maxVariability: maxVariability,
                        svg: svg,
                        margin: margin,
                        showBottomAxis: (j == visible_layers - 1),
                        color: state['layers'][layers[layer_index]]['color']
                }));

And we drew the variability information like this (with everything being black):

    this.lineContainer.append("path")
                              .data([this.variability])
                              .attr("class", "line")
                              .style("stroke", '#000000')
                              .style("stroke-width", "1")
                              .attr("d", this.line);

Now variability[layer_index] in fact contains four arrays, and we would like to display values in each array with a different color. Clearly, for every layer_index, now we need to send four items at a time into the charts object from the variability array. Something like this, if you can excuse the lack of elegance in this example:

        charts.push(new Chart({
                        name: layers[layer_index],
                        coverage: coverage[layer_index],
                        variability_0: variability[layer_index * 4],
                        variability_1: variability[layer_index * 4 + 1],
                        variability_2: variability[layer_index * 4 + 2],
                        variability_3: variability[layer_index * 4 + 3],
                        competing_nucleotides: competing_nucleotides[layer_index],
                        id: j++,
                        width: width,
                        height: chartHeight,
                        maxVariability: maxVariability,
                        svg: svg,
                        margin: margin,
                        showBottomAxis: (j == visible_layers - 1),
                        color: state['layers'][layers[layer_index]]['color']
                }));

Each of those variability_* arrays then should be drawn, using a different color for each layer (for now colors can be black, red, blue, and green for each of the four).

To have the best test environment you can follow these steps:

anvi-interactive -p sandbox/test-output/204-MERGED/PROFILE.db -c sandbox/test-output/CONTIGS.db

image

You will see some bars, but they are all wrong at this point :)


So this is the most important thing for now, but there is one more thing that may be even more tricky (at least to me it was, and maybe you will be like "what were you thinking, Meren!!!" and immediately want to fix it).

We are doing something very very embarrassing in the code to create those variability arrays. For instance we have 100 positions, and let's say only the third of them has a "variability" of 1. In order to draw those bars (which are not bars, actually, but coming from area objects), we create an array with 100 zeros, and change the third item to 1 :( So for one bit of information, we send a tremendous amount of data. I know! In fact here is a piece of FIXME in the anvio/bottleroutes.py about this:

        # FIXME: we get a nice dict back, but here we convert it into a shitty list... this is one of the
        #        most inefficient piece of code in the entire platform, and it is very embarrassing to
        #        have it this way here, but fortunately no one really reads the code :/
        vd = split_variability_info_dict[layer]['variability']
        for pos_in_codon in range(0, 4):
            l = copy.deepcopy(zeros_for_all_positions)
            progress.append(' %d ..' % pos_in_codon)
            for pos in vd[pos_in_codon]:
                l[pos] = vd[pos_in_codon][pos]
            data['variability'].append(l)

I am wondering, if we can simply send that dictionary, instead of building embarrassing arrays, and find a way to display that information using vertical lines at appropriate positions in the charts page :(

Thank you very much in advance for looking into these things.

paczian commented 8 years ago

Resolved by #312

meren commented 8 years ago

You are the best, @paczian, thank you for this. I can't believe that mess is not there anymore :)

Although there is a little bug. Although it shows things nicely in the main view:

image

The zoom doesn't seem to work properly:

image

(the black line is where the green line should have been).

Thank you very much again!

meren commented 8 years ago

I put a new and more complex test set, so it becomes much easier to see results:

image

You can run ./run_sf_test.sh for this.

Best wishes,

paczian commented 8 years ago

fixed by #316