Open laurensknoll opened 2 months ago
Hello @laurensknoll,
Thank you for bringing this issue to our attention.
I've reviewed the current code and noted that when querying the googledirectory_group_member
table, the group_id
parameter is essential for calling the List Members API.
I've implemented some updates in the issue-88
branch to enhance how members are listed concurrently:
issue-88
:googledirectory_group
table now acts as the parent for googledirectory_group_member
.group_id
column has been made optional.group_id
when querying googledirectory_group_member
, allowing for simpler queries like select * from googledirectory_group_member
.I have modified the query plan to make it faster. The behavior of the query(mentioned by you in the issue description) will remain the same as we are making the API call by passing the Group ID.
WITH GroupDetails AS (
SELECT
id AS group_id,
customer_id,
name AS group_name
FROM
googledirectory.googledirectory_group
WHERE
customer_id = 'my_customer_id'
),
MemberDetails AS (
SELECT
id AS member_id,
group_id,
type AS member_type,
email AS member_email,
status AS member_status
FROM
googledirectory.googledirectory_group_member
)
SELECT
g.customer_id,
g.group_id,
g.group_name,
m.member_id,
m.member_type,
m.member_email,
m.member_status
FROM
GroupDetails g
JOIN MemberDetails m ON m.group_id = g.group_id;
Please try pulling the changes from issue-88
, build the plugin locally, and run the above query. Your feedback would be greatly appreciated.
issue-88
:git clone https://github.com/turbot/steampipe-plugin-googledirectory.git
cd steampipe-plugin-googledirectory
git checkout issue-88
make
Looking forward to your feedback. Thanks again!
Hello, @laurensknoll, Did you get an opportunity to give it a try?
Hi @ParthaI , Thanks for preparing the improvement. I've run both version to check performance. However, more interesting, the improvement branch returns different results. This is not expected, because the WHERE direct_member_count > 0
should filter out all the groups without members.
query "google_group_members" {
sql = <<-EOQ
select
googledirectory_group.customer_id as customer_id,
googledirectory_group.id as group_id,
googledirectory_group.name as group_name,
googledirectory_group_member.id as member_id,
googledirectory_group_member.type as member_type,
googledirectory_group_member.email as member_email,
googledirectory_group_member.status as member_status
from googledirectory.googledirectory_group
left join googledirectory.googledirectory_group_member
on googledirectory_group_member.group_id = googledirectory_group.id
where googledirectory_group.customer_id = $1
and googledirectory_group.direct_members_count > 0
limit 200
EOQ
param "customer_id" {
description = "$1, Google Workspace customer ID."
}
}
Old:
time steampipe query 'monitoring.query.google_group_members(customer_id => “…”)’
steampipe query 7.70s user 1.98s system 12% cpu 1:16.63 total
New:
time steampipe query 'monitoring.query.google_group_members(customer_id => “…”)’
steampipe query 18.63s user 3.55s system 14% cpu 2:29.76 total
Old:
+-------------+-----------------+-------------------------+-----------------------+-------------+
| customer_id | group_id | group_name | member_id | member_type |
+-------------+-----------------+-------------------------+-----------------------+-------------+
| ... | 00lnxb... | group1 | 10365.. | USER |
| ... | 00lnxb... | group1 | 11481.. | USER |
| ... | 00lnxb... | group1 | 11315.. | USER |
| ... | 00ihv6... | group2 | 041mg.. | GROUP |
| ... | 02szc7... | group3 | 01gf8.. | GROUP |
| ... | 02p2cs... | group4 | 040ew.. | GROUP |
New:
+-------------+-----------------+-------------------------+-----------+-------------+--------------+---------------+
| customer_id | group_id | group_name | member_id | member_type | member_email | member_status |
+-------------+-----------------+-------------------------+-----------+-------------+--------------+---------------+
| ... | 00ln.. | group1 | <null> | <null> | <null> | <null> |
| ... | 02p2.. | group2 | <null> | <null> | <null> | <null> |
| ... | 04an.. | group3 | <null> | <null> | <null> | <null> |
| ... | 01x0.. | group4 | <null> | <null> | <null> | <null> |
| ... | 02sz.. | group5 | <null> | <null> | <null> | <null> |
| ... | 049x.. | group6 | <null> | <null> | <null> | <null> |
| ... | 01mr.. | group7 | <null> | <null> | <null> | <null> |
| ... | 00ih.. | group8 | <null> | <null> | <null> | <null> |
| ... | 025b.. | group9 | <null> | <null> | <null> | <null> |
| ... | 049x.. | group10 | <null> | <null> | <null> | <null> |
| ... | 02lw.. | group11 | <null> | <null> | <null> | <null> |
@laurensknoll Thank you for testing the PR branch. I've committed another update to resolve the output result discrepancies for the specified query. In the enhancement branch, we have structured the table to enable parallel API calls by configuring the ParentHydrate within the table definition.
The table will retrieve all groups in your workspace, then it will execute parallel API calls for each group to fetch the group members.
Have you tried the following query:
WITH GroupDetails AS (
SELECT
id AS group_id,
customer_id,
name AS group_name
FROM
googledirectory.googledirectory_group
WHERE
customer_id = 'my_customer_id'
),
MemberDetails AS (
SELECT
id AS member_id,
group_id,
type AS member_type,
email AS member_email,
status AS member_status
FROM
googledirectory.googledirectory_group_member
)
SELECT
g.customer_id,
g.group_id,
g.group_name,
m.member_id,
m.member_type,
m.member_email,
m.member_status
FROM
GroupDetails g
JOIN MemberDetails m ON m.group_id = g.group_id;
which is equivalent to
select
g.customer_id as customer_id,
g.id as group_id,
g.name as group_name,
m.id as member_id,
m.type as member_type,
m.email as member_email,
m.status as member_status
from googledirectory.googledirectory_group as g
join googledirectory.googledirectory_group_member as m on m.group_id = g.id
where g.customer_id = 'my_customer_id'
Could you please pull the latest changes from the issue-88
branch, build the plugin locally, and then execute the query? Let us know if you encounter any discrepancies in data between the old and new changes.
Thanks!
Hi @ParthaI , thanks for the update.
When running the query, many 404's are returned by the API. These seem to be caused by the ParentHydrate query for the group (this API requires either customer_id or domain to be configured). Can you take a look at that using the additional evidence below.
BTW In terms of concurrency, the number of requests seems to stick at ~4 req/sec. We should strive for 30 req/sec or so.
2024-04-30 12:24:46.206 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: calling fetchItems, table: googledirectory_group_member, matrixItem: [], limit: -1 (googledirectory-1714479886454)
2024-04-30 12:24:46.206 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: QueryData streamRows (googledirectory-1714479886454)
2024-04-30 12:24:46.206 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: setRequestSubscriber readRowsAsync publisher googledirectory-1714479886454 (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: initialiseRateLimiters for query data 0x14000216780 (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: getHydrateCallRateLimiter (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: resolveRateLimiterScopeValues (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: HydrateCall tags map[function_name:listDirectoryGroups]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: Table tags map[table:googledirectory_group_member]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: QueryData rateLimiterScopeValues map[connection:googledirectory]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: merged scope values map[connection:googledirectory function_name:listDirectoryGroups table:googledirectory_group_member]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: resolvedRateLimiterConfig: no rate limiters (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: getHydrateCallRateLimiter (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: resolveRateLimiterScopeValues (googledirectory-1714479886454)
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: HydrateCall tags map[function_name:listDirectoryGroupMembers]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: Table tags map[table:googledirectory_group_member]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: QueryData rateLimiterScopeValues map[connection:googledirectory]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: merged scope values map[connection:googledirectory function_name:listDirectoryGroupMembers table:googledirectory_group_member]
2024-04-30 12:24:46.209 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: resolvedRateLimiterConfig: no rate limiters (googledirectory-1714479886454)
2024-04-30 12:24:46.383 UTC [WARN] steampipe-plugin-googledirectory.plugin: [WARN] 1714479886454: doList callHydrateWithRetries (googledirectory-1714479886454) returned err googleapi: Error 404: Domain not found., notFound
2024-04-30 12:24:46.383 UTC [WARN] steampipe-plugin-googledirectory.plugin: [WARN] 1714479886454: QueryData StreamError googleapi: Error 404: Domain not found., notFound (googledirectory-1714479886454)
2024-04-30 12:24:46.383 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: rowData chan returned nil - wait for rows to complete (googledirectory-1714479886454)
2024-04-30 12:24:46.383 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: buildRowsAsync goroutine returning (googledirectory-1714479886454)
2024-04-30 12:24:46.383 UTC [INFO] steampipe-plugin-googledirectory.plugin: [INFO] 1714479886454: QueryData streamRows DONE (googledirectory-1714479886454)
Sorry @laurensknoll for being radio silent on this matter for a while, I will give it a shot and let you know. Thanks!
Is your feature request related to a problem? Please describe. When querying all googledirectory_group_member using the next query, the members are not queried in parallel, making it slow:
Describe the solution you'd like The preferred solution executes the
googledirectory_group_member.list
-methods in parallel, using a configurable concurrency to reach the Admin SDK rate limit of 2400 req/min.Describe alternatives you've considered
Additional context In the plugin log, the group_member-calls are listed sequentially:
And, consequently, the Admin SDK API monitoring lists 1 to 2 req/second.