Here's the timeline of events that's happened while implementing query intersections:
First implementation used plain JS objects. Intersection appeared to be correct, but slow. We thought this was because of Javascript Sets not supporting an O(1) "contains" operation on plain objects.
To get around the Set problem, we tried using Immutable.js, which supports placing objects as keys in Sets and still achieving O(1) "contains" operations. After switching to Immutable.js, we saw an algorithmic improvement in the Intersection query (which is what we were hoping for), but we saw a large constant slowdown across the rest of the application due to Immutable.js being so heavyweight.
At this point, we dived into proper performance profiling. We found that field access to Immutable.js data fields takes much more time than we had anticipated, which caused the overall slowdown we saw. We rolled back to the point where we were doing intersection using regular JS objects, and found that there were a few hotspots slowing down the intersection query. Notably, we were doing repetitive work concatenating the UniqueGeneName and AlternativeGeneNames together inside of a loop (e.g. const names = [annotation.UniqueGeneName, ...annotation.AlternativeGeneNames]), which repetitively created new JS objects in a tight loop. We moved this process out to the ingestion step (rather than the query step) and saw a dramatic improvement in performance. This, along with some other improvements, brought the performance into an acceptable range.
Here's the timeline of events that's happened while implementing query intersections:
const names = [annotation.UniqueGeneName, ...annotation.AlternativeGeneNames]
), which repetitively created new JS objects in a tight loop. We moved this process out to the ingestion step (rather than the query step) and saw a dramatic improvement in performance. This, along with some other improvements, brought the performance into an acceptable range.