Response-level identity map (currently only in Responses::HTTP)
Given: (:Person)-[:WROTE]->(:Article) (with 1:N relationships)
When I run the following query:
Person
.as(:person)
.where(id: author_id)
.articles(:article)
.pluck(:person, :article)
I get a different instance of Person for every Article in the result set. It's reasonable to assume we can return the same instance for all references to the same node in the same result. This PR is a proof of concept for that in the hopes that I didn't overlook any context behind yielding new instances each time. :-) Additional benefits include saving a bit of RAM and probably some CPU time since we can skip the conversion from hash to ActiveNode/ActiveRel on subsequent occurrences of the same node in the result.
I didn't test whether nodes/rels in paths would use the cached instances, but if not, maybe that's a relatively simple change? I mainly wanted to get thoughts on this before I waded in too deep. :-)
Identity maps can be gross because cache invalidation is hard, but the scope of caching here is pretty small. It only sits at the response level, so it'll be thrown away with the response and subsequent queries will return fresh instances. Per-query identity seems like a reasonable compromise between performance and flipping tables over frustration caused by cache-invalidation bugs.
Coverage increased (+0.04%) to 73.718% when pulling 0123743c89bbe36d783555a07c0b0da4272ae3c5 on jgaskins:response-identity-map into 44f0754a500fa0b86ed5fa27a18e00fb4d88ba0a on neo4jrb:master.
This pull introduces/changes:
Responses::HTTP
)Given:
(:Person)-[:WROTE]->(:Article)
(with 1:N relationships)When I run the following query:
I get a different instance of
Person
for everyArticle
in the result set. It's reasonable to assume we can return the same instance for all references to the same node in the same result. This PR is a proof of concept for that in the hopes that I didn't overlook any context behind yielding new instances each time. :-) Additional benefits include saving a bit of RAM and probably some CPU time since we can skip the conversion from hash toActiveNode
/ActiveRel
on subsequent occurrences of the same node in the result.I didn't test whether nodes/rels in paths would use the cached instances, but if not, maybe that's a relatively simple change? I mainly wanted to get thoughts on this before I waded in too deep. :-)
Identity maps can be gross because cache invalidation is hard, but the scope of caching here is pretty small. It only sits at the response level, so it'll be thrown away with the response and subsequent queries will return fresh instances. Per-query identity seems like a reasonable compromise between performance and flipping tables over frustration caused by cache-invalidation bugs.
Pings: @cheerfulstoic @subvertallchris