opsmill / infrahub

Infrahub - A new approach to Infrastructure Management
https://opsmill.com/
GNU Affero General Public License v3.0
192 stars 10 forks source link

feat(backend, db): use metadata when executing query #3764

Closed fatih-acar closed 2 months ago

fatih-acar commented 3 months ago

This will allow us to better track query executions.

Adding metadata to queries makes neo4j print the following query logs (metadata is printed at the end of the log):

2024-07-04 13:51:32.341+0000 INFO  id:2634 - transaction id:2591 - 0 ms: (planning: 0, waiting: 0) - 7584 B - 0 page hits, 0 page faults - bolt-session bolt  neo4j-python/5.20.0 Python-Rust/3.11.9-final-0 (linux)           client/172.18.0.5:45318 server/172.18.0.3:7687> neo4j - neo4j - MATCH (n:Node) WHERE n.uuid IN $ids
MATCH (n)-[:HAS_ATTRIBUTE]-(a:Attribute)
WHERE a.name IN $field_names
CALL {
WITH n, a
MATCH (n)-[r:HAS_ATTRIBUTE]-(a:Attribute)
WHERE ((r.branch IN $branch0 AND r.from <= $time0 AND r.to IS NULL)
OR (r.branch IN $branch0 AND r.from <= $time0 AND r.to >= $time0))
RETURN n as n1, r as r1, a as a1
ORDER BY r.branch_level DESC, r.from DESC
LIMIT 1
}
WITH n1 as n, r1, a1 as a
WHERE r1.status = "active"
WITH n, r1, a
MATCH (a)-[:HAS_VALUE]-(av:AttributeValue)
CALL {
WITH a, av
MATCH (a)-[r:HAS_VALUE]-(av:AttributeValue)
WHERE ((r.branch IN $branch0 AND r.from <= $time0 AND r.to IS NULL)
OR (r.branch IN $branch0 AND r.from <= $time0 AND r.to >= $time0))
RETURN a as a1, r as r2, av as av1
ORDER BY r.branch_level DESC, r.from DESC
LIMIT 1
}
WITH n, r1, a1 as a, r2, av1 as av
WHERE r2.status = "active"
WITH n, a, av, r1, r2
MATCH (a)-[rel_isv:IS_VISIBLE]-(isv:Boolean)
MATCH (a)-[rel_isp:IS_PROTECTED]-(isp:Boolean)
WHERE all(r IN [rel_isv, rel_isp] WHERE ( ((r.branch IN $branch0 AND r.from <= $time0 AND r.to IS NULL)
OR (r.branch IN $branch0 AND r.from <= $time0 AND r.to >= $time0)) ))
OPTIONAL MATCH (a)-[rel_source:HAS_SOURCE]-(source)
WHERE all(r IN [rel_source] WHERE ( ((r.branch IN $branch0 AND r.from <= $time0 AND r.to IS NULL)
OR (r.branch IN $branch0 AND r.from <= $time0 AND r.to >= $time0)) ))
OPTIONAL MATCH (a)-[rel_owner:HAS_OWNER]-(owner)
WHERE all(r IN [rel_owner] WHERE ( ((r.branch IN $branch0 AND r.from <= $time0 AND r.to IS NULL)
OR (r.branch IN $branch0 AND r.from <= $time0 AND r.to >= $time0)) ))
RETURN n,a,av,r1,r2,isv,isp,rel_isv,rel_isp,source,rel_source,owner,rel_owner
ORDER BY n.uuid,a.name
LIMIT 5000 - {ids: [], branch0: ['-global-', 'main'], time0: '2024-07-04T13:51:32.322586Z', field_names: ['id', 'hfid', 'display_label', '__typename', 'default_branch', 'commit', 'name', 'description', 'location', 'username', 'password', 'tags']} - runtime=pipelined - {name: 'node_list_get_attribute', infrahub_id: 4}
fatih-acar commented 3 months ago

I like the idea of having our own counter but not sure if this implementation is gonna work. As far as I understand, on a Gunicorn server, all workers will have their own counter with overlapping value Could we use a UUID instead ?

You're right, it's actually broken and actually the counter resets most of the time due to transactions creating a new InfrahubDatabase class instance... I initially wanted to retrieve the internal neo4j query id but couldn't figure out how. I thought about generating uuids but it seemed heavy to do so at each query.

I will rather use the tracing span id instead (since it's only useful when tracing is enabled), I think it's the best of both worlds.

dgarros commented 3 months ago

I will rather use the tracing span id instead (since it's only useful when tracing is enabled), I think it's the best of both worlds.

Agreed, I was thinking the same