Closed stepanof closed 9 months ago
Hi is the now added to a release? Or when will this feature added?
Hi @harshach, how is such a feature implementation decided? Conceptually: Is a procedure a pipeline or should it be implemented simarily to views? I see a problem with the latter, since a procedure must not have data as a result. And the former is a problem since MSSQL (or any other database with procedures) is of databaseservice and not of pipelineservice. Do you have any suggestions of how to proceed with this feature?
I would certainly advocate the possibility for MSSQL (or any other database with procedures) to be (also) as of type pipelineservice. As stated by @phalix , stored procedures do not generate a (meaningful) resultset in most cases. Also for us this is a feature we're very much looking forward to.
@phalix @proverbs53 the feature is already underway for 1.2. Please look at this ticket for details https://github.com/open-metadata/OpenMetadata/issues/12998
Let's close this, as Harsha said it's already part of 1.2 release
hi @stepanof I am looking into this. I have figured out how to list the stored procedures, get their definition etc. I am looking now on the best way to figure out when was each Stored Procedure executed (similar to how we get Snowflake Stored Procedure lineage, for example).
Looks like we should be able to use the sys.dm_exec_procedure_stats
table, but it only stores the last execution for each procedure. This means we might be missing some info.
LMK if you have any pointers here. Will keep looking meanwhile.
UPDATE: To give some more details, we usually do:
EXEC
calls 🔴 -> That's what I am missing right nowEXEC
calls are triggeringThanks
ok so the final query to extract Stored Procedure lineage based on the executed queries will be this so far:
WITH SP_HISTORY (start_time, end_time, procedure_name, query_text) AS (
select
s.last_execution_time start_time,
DATEADD(s, s.total_elapsed_time/1000, s.last_execution_time) end_time,
OBJECT_NAME(object_id, database_id) as procedure_name,
text as query_text
from sys.dm_exec_procedure_stats s
CROSS APPLY sys.dm_exec_sql_text(s.plan_handle)
WHERE OBJECT_NAME(object_id, database_id) IS NOT NULL
),
Q_HISTORY (database_name, query_text, start_time, end_time, duration,query_type, schema_name, user_name) AS (
select
db.NAME database_name,
t.text query_text,
s.last_execution_time start_time,
DATEADD(s, s.total_elapsed_time/1000, s.last_execution_time) end_time,
s.total_elapsed_time/1000 duration,
case
when t.text LIKE '%%MERGE%%' then 'MERGE'
when t.text LIKE '%%UPDATE%%' then 'UPDATE'
when t.text LIKE '%%SELECT%%INTO%%' then 'CREATE_TABLE_AS_SELECT'
when t.text LIKE '%%INSERT%%' then 'INSERT'
else 'UNKNOWN' end query_type,
NULL schema_name,
NULL user_name
FROM sys.dm_exec_cached_plans AS p
INNER JOIN sys.dm_exec_query_stats AS s
ON p.plan_handle = s.plan_handle
CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
INNER JOIN sys.databases db
ON db.database_id = t.dbid
WHERE s.last_execution_time between '2024-01-13' and '2024-01-20'
AND t.text NOT LIKE '/* {{"app": "OpenMetadata", %%}} */%%'
AND t.text NOT LIKE '/* {{"app": "dbt", %%}} */%%'
AND p.objtype NOT IN ('Prepared', 'Proc')
)
select
Q.query_type AS QUERY_TYPE,
Q.database_name AS QUERY_DATABASE_NAME,
Q.schema_name AS QUERY_SCHEMA_NAME,
Q.query_text AS QUERY_TEXT,
Q.user_name AS QUERY_USER_NAME,
Q.start_time AS QUERY_START_TIME,
Q.duration AS QUERY_DURATION,
SP.procedure_name AS PROCEDURE_NAME,
SP.query_text AS PROCEDURE_TEXT,
SP.start_time AS PROCEDURE_START_TIME,
SP.end_time AS PROCEDURE_END_TIME
from SP_HISTORY SP
JOIN Q_HISTORY Q
ON (
Q.start_time BETWEEN SP.start_time and SP.end_time
OR Q.end_time BETWEEN SP.start_time and SP.end_time
)
order by PROCEDURE_START_TIME desc
;
We're trying to join the time an SP was executed vs. the queries executed during that time as well. Other dbs helps us narrow this down further by joining via SESSION_ID, which would help us reduce false positives.
Did not find a way to get that in MSSQL, so will do this first iteration.
If anyone has a better idea to link SP <> Query so that we can parse the lineage, please let us know.
Thanks
Is your feature request related to a problem? Please describe. We use procedures on MsSQL servers in our ETL processes. If the procedure was changed so our ETL can fail and we won't know the reason. So it will be really helpfull if OM could send alert if procedures code was changed.
Describe the solution you'd like OM can ingest not only tables metadata but also metadata of procedures on MsSQL servers. Also procedures may be ingested as pipelines.