graph-genome / Schematize

Visualization component of Pangenome Schematics for 1,000s of individuals and gigabase genomes.
http://graphgenome.org
Apache License 2.0
10 stars 8 forks source link

Profiling Performance #92

Open josiahseaman opened 4 years ago

josiahseaman commented 4 years ago

Attach file of profiling.

dimatr commented 4 years ago

Done with Google Chrome DevTools

  1. Profile-changeWidth-20200527T165259.json.zip Here on a binWidth combobox change one has to wait about 50 sec. The profile shows App.js:updateSchematicMetadata() takes about 40 sec, particularly the setState() part. Josiah has mentioned this method can be called only once on the very start. This will definitely help for further experience, but still requires the first 40 sec to sit and wait .. The call boils down to unmountHostComponents and Node._remove - multiple unnecessary add/remove components?

  2. Profile-mouseOverLinks-20200527T164801.json.zip One has to wait 2-3 sec when doing mouseover a link arrow. The profiler shows these "waves", in each about 2 sec are spent in Garbage Collector == lots of tiny temp objects are created and discarded.

dimatr commented 4 years ago

To reproduce the first problem please start the app, wait until the first screen is rendered, then change the binWidth combobox to e.g. 256. I am on the deploy branch == the json data is taken from remote.

dimatr commented 4 years ago

a short update on Schematize profiling.

The 40 sec waiting time is caused by the huge number of React items, particularly component.matrix. They are rendered row by row, and this is exactly the place where Virtualized should be used. If a row is not visible, no need to render it.

A small improvement in the links mouseover performance: now 1.5 sec instead of 3 sec. This method will also be greatly improved once Virtualized is implemented - it uses the same renderMatrixRow() calls

App.js: 330
  // Now it is wrapped in the updateHighlightedNode() function
  _updateHighlightedNode = (linkRect) => {
    this.setState({ highlightedLink: linkRect });
    // this.recalcXLayout(); // dima: this one is not needed, for it is called in the setState itself
  };

With and without this line the behavior is the same.

josiahseaman commented 4 years ago

Thinking about it, I also realized that we have a choice surrounding chunksProcessed versus chunkURLs. An easy fix for updateSchematicMetadata is ensuring that arrayEquals(chunksProcessed , chunkURLs) and simply don't run if false. However, this means the rendering might take longer based on file lag. Ideally, one chunk at a time could be rendered and cached so that the calculations are not revisited when they'd be the same outcome. Unfortunately, reserveAirSpace() is an intrinsically "full render" algorithm. I already tried to precompute it and the solution is not obvious. Thankfully that doesn't increase in complexity with increasing individuals, it's the MatrixCell that really starts to take over.

Also, I'd take a look at the array reference App.state.components = App.schematic.components. I don't think it's copying the array of elements, but if it did that would be horrible for performance! I think removing components from App.state entirely is a good idea. It's not necessary anymore since we have mobx state tracking. React is designed for partial render updates based on state and props. We may be accidentally causing a full re-render. I can imagine further complexities, but I'll stop rambling for now.

josiahseaman commented 4 years ago

It's become clear that rendering is a big performance problem for larger datasets due to the number of elements. The first optimization I'd like to work on is reducing the number of MatrixCells that get generated. For SAR-CoV-2, much of them have 100% occupancy, so a good solution is to use a single SpanCell to span across multiple MatrixCells in the same row. This has the nice benefit of not really interacting with anything else, like compressedRowMapping #94, or Y position or anything like that.

I'm targetting:

    return row.map((cell, x) => {
      if (cell.length) {
        return (
          <>
            <MatrixCell
                key={"occupant" + uncompressed_y + x}
                item={cell}
                store={this.props.store}
                pathName={this.props.pathNames[uncompressed_y]}
                x={xBase + pixelsX[x]}
                y={
                this_y * this.props.store.pixelsPerRow +
                this.props.store.topOffset
              }
                row_number={uncompressed_y}
                width={width}
                height={this.props.store.pixelsPerRow}
            />
          </>
        );
josiahseaman commented 4 years ago

SpanCells

The basic position holds true, but we need to aggregate data for coloring and mouseover. Thomas also pointed out this step is better done in CS, but it would mean yet another data format with sparse data and "Spans" of cells.

Timer results loading SARS-CoV-b2

bin 1: 14 sec. vs 1 sec. bin 4: 36 sec. vs 1 sec. bin 16: 28 sec. vs 1 sec.

This is an acceptable level of performance if I can just sort out one last X position bug when scrolling.