turbot / steampipe-plugin-net

Use SQL to instantly query DNS records, certificates and other network information. Open source CLI. No DB required.
https://hub.steampipe.io/plugins/turbot/net
Apache License 2.0
22 stars 5 forks source link

net_dns_reverse column selection without wildcard #61

Closed leonjza closed 1 year ago

leonjza commented 1 year ago

Describe the bug Selecting specific columns from net_dns_reverse appears to break with the error:

Error: rpc error: code = Internal desc = 'List' call for table 'net_dns_reverse' is missing 1 required qual: column:'ip_address' operator: =
 (SQLSTATE HV000)

Steampipe version (steampipe -v)

$ steampipe -v
Steampipe v0.20.5

Plugin version (steampipe plugin list)

| hub.steampipe.io/plugins/turbot/net@latest        | 0.9.0   | net         |

To reproduce

Write a query such as this:

query "dns_records_reverse" {
  sql = <<-EOQ
    select
      ip_address,
      jsonb_array_elements_text(domains) as domain
    from
      net_dns_reverse
    where
      ip_address in (
        select
          ip
        from
          net_dns_record
        where
          domain = $1
          and ip is not null
      )
  EOQ

  param "domain" {
    description = "Domain to lookup"
  }
}

Using the steampipe query tool, run the query like this:

Welcome to Steampipe v0.20.5
For more information, type .help
> query.dns_records_reverse("google.com")

Error: rpc error: code = Internal desc = 'List' call for table 'net_dns_reverse' is missing 1 required qual: column:'ip_address' operator: =
 (SQLSTATE HV000)

+------------+--------+
| ip_address | domain |
+------------+--------+
+------------+--------+
>

Expected behavior The query should return results without an error.

Additional context The error can be avoided by adding the wildcard column selector (aka: *). For exaxmple

query "dns_records_reverse" {
  sql = <<-EOQ
    select
      *,
      ip_address,
      jsonb_array_elements_text(domains) as domain
    from
      net_dns_reverse
    where
      ip_address in (
        select
          ip
        from
          net_dns_record
        where
          domain = $1
          and ip is not null
      )
  EOQ

  param "domain" {
    description = "Domain to lookup"
  }
}

In this case, the results are successful, but with more columns than intended.

> query.dns_records_reverse("google.com")
+----------------+--------------------------------+---------------------------+----------------+----------------------------+
| ip_address     | domains                        | _ctx                      | ip_address     | domain                     |
+----------------+--------------------------------+---------------------------+----------------+----------------------------+
| 172.217.170.46 | ["jnb02s03-in-f14.1e100.net."] | {"connection_name":"net"} | 172.217.170.46 | jnb02s03-in-f14.1e100.net. |
+----------------+--------------------------------+---------------------------+----------------+----------------------------+
>
cbruno10 commented 1 year ago

Hi @leonjza , sorry you're running into some errors!

Can you please share what the results of the subquery are (if it's hard to run through the query resource, you can hardcode a domain and run the query ad-hoc):

select
  ip 
from
  net_dns_record 
where
  domain = "google.com"
  and ip is not null

Also, can you please also share the results of running explain on each of the queries you've tried, e.g.,

explain 
select
  ip_address,
  jsonb_array_elements_text(domains) as domain 
from
  net_dns_reverse 
where
  ip_address in 
  (
    select
      ip 
    from
      net_dns_record 
    where
      domain = "google.com" 
      and ip is not null 
  )

and

explain 
select
  *,
  ip_address,
  jsonb_array_elements_text(domains) as domain 
from
  net_dns_reverse 
where
  ip_address in 
  (
    select
      ip 
    from
      net_dns_record 
    where
      domain = "google.com" 
      and ip is not null 
  )

Sometimes Steampipe's query planner doesn't identify key columns correctly, like ip_address, which is required for any net_dns_reverse table queries, which is why your first query may be failing incorrectly. The results from the queries I shared above should help us identify if this is the issue.

Subhajit97 commented 1 year ago

@leonjza Along with the details that @cbruno10 asked above, could you please try the same query by adding an order by clause? For example:

select
  ip_address,
  jsonb_array_elements_text(domains) as domain
from
  net_dns_reverse
where
  ip_address in (
    select
      ip
    from
      net_dns_record
    where
      domain = 'google.com'
      and ip is not null
    order by ip
  )
leonjza commented 1 year ago

@cbruno10 and @Subhajit97, answered both below. Thanks for the fast replies ❤️

Can you please share what the results of the subquery are (if it's hard to run through the query resource, you can hardcode a domain and run the query ad-hoc):

select
  ip 
from
  net_dns_record 
where
  domain = "google.com"
  and ip is not null
> select
  ip
from
  net_dns_record
where
  domain = 'google.com'
  and ip is not null
+------------------------------+
| ip                           |
+------------------------------+
| 172.217.170.46               |
| 2c0f:fb50:4002:805::200e/128 |
+------------------------------+

Also, can you please also share the results of running explain on each of the queries you've tried, e.g.,

explain 
select
  ip_address,
  jsonb_array_elements_text(domains) as domain 
from
  net_dns_reverse 
where
  ip_address in 
  (
    select
      ip 
    from
      net_dns_record 
    where
      domain = "google.com" 
      and ip is not null 
  )
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                         |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| ProjectSet  (cost=0.00..20000100278749.00 rows=50000000 width=64)                                                                                  |
|   ->  Nested Loop Semi Join  (cost=0.00..20000100024999.00 rows=500000 width=64)                                                                   |
|         ->  Foreign Scan on net_dns_reverse  (cost=0.00..20000000000000.00 rows=1000000 width=200)                                                 |
|         ->  Foreign Scan on net_dns_record  (cost=0.00..200.00 rows=1 width=200)                                                                   |
|               Filter: ((ip IS NOT NULL) AND (domain = 'google.com'::text) AND (net_dns_reverse.ip_address = ip) AND (domain = 'google.com'::text)) |
+----------------------------------------------------------------------------------------------------------------------------------------------------+

and

explain 
select
  *,
  ip_address,
  jsonb_array_elements_text(domains) as domain 
from
  net_dns_reverse 
where
  ip_address in 
  (
    select
      ip 
    from
      net_dns_record 
    where
      domain = "google.com" 
      and ip is not null 
  )
+---------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                              |
+---------------------------------------------------------------------------------------------------------+
| ProjectSet  (cost=20000000002500.00..20000000376256.00 rows=50000000 width=160)                         |
|   ->  Nested Loop  (cost=20000000002500.00..20000000122506.00 rows=500000 width=96)                     |
|         ->  HashAggregate  (cost=20000000002500.00..20000000002502.00 rows=200 width=200)               |
|               Group Key: net_dns_record.ip                                                              |
|               ->  Foreign Scan on net_dns_record  (cost=0.00..20000000000000.00 rows=1000000 width=200) |
|                     Filter: ((ip IS NOT NULL) AND (domain = 'google.com'::text))                        |
|         ->  Foreign Scan on net_dns_reverse  (cost=0.00..600.00 rows=2 width=300)                       |
|               Filter: (ip_address = net_dns_record.ip)                                                  |
+---------------------------------------------------------------------------------------------------------+

@leonjza Along with the details that @cbruno10 asked above, could you please try the same query by adding an order by clause? For example:

select
  ip_address,
  jsonb_array_elements_text(domains) as domain
from
  net_dns_reverse
where
  ip_address in (
    select
      ip
    from
      net_dns_record
    where
      domain = 'google.com'
      and ip is not null
    order by ip
  )

Works!

> select
  ip_address,
  jsonb_array_elements_text(domains) as domain
from
  net_dns_reverse
where
  ip_address in (
    select
      ip
    from
      net_dns_record
    where
      domain = 'google.com'
      and ip is not null
    order by ip
  )
+----------------+----------------------------+
| ip_address     | domain                     |
+----------------+----------------------------+
| 172.217.170.46 | jnb02s03-in-f14.1e100.net. |
+----------------+----------------------------+

Curious, what can I use as a hint to know to use the order by?

Subhajit97 commented 1 year ago

@leonjza, per the net_dns_reverse table design, the column ip_address is required when running any query against that table.

If you check the query plan for the query without order by:

+----------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                         |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| ProjectSet  (cost=0.00..20000100278749.00 rows=50000000 width=64)                                                                                  |
|   ->  Nested Loop Semi Join  (cost=0.00..20000100024999.00 rows=500000 width=64)                                                                   |
|         ->  Foreign Scan on net_dns_reverse  (cost=0.00..20000000000000.00 rows=1000000 width=200)                                                 |
|         ->  Foreign Scan on net_dns_record  (cost=0.00..200.00 rows=1 width=200)                                                                   |
|               Filter: ((ip IS NOT NULL) AND (domain = 'google.com'::text) AND (net_dns_reverse.ip_address = ip) AND (domain = 'google.com'::text)) |
+----------------------------------------------------------------------------------------------------------------------------------------------------+

You can see the query against the net_dns_reverse table executes first, but the ip_address depends on the output of the net_dns_record table. Hence the error comes.

Filter: ((ip IS NOT NULL) AND (domain = 'google.com'::text) AND (net_dns_reverse.ip_address = ip) AND (domain = 'google.com'::text))

If you add the order by clauses, often that will trick/change the planer. You can see that after using order by clauses, the planner uses the below filter, where it takes the ip_address from the net_dns_record table output:

Foreign Scan on net_dns_reverse  (cost=0.00..400.00 rows=2 width=200)                             |
|               Filter: (ip_address = net_dns_record.ip)
cbruno10 commented 1 year ago

@kaidaguerre Based on the results @leonjza shared in https://github.com/turbot/steampipe-plugin-net/issues/61#issuecomment-1593231257, why does adding * to the select statement cause the query planner to pick up the qual for ip_address correctly?

kaidaguerre commented 1 year ago

Unfortunately, the Postgres query planner is a black box and it's very hard to pin down exactly what causes it to come up with the plan - it applies various heuristics.

The FDW attempts to influence the query planner by reporting a cheap cost for paths which provide required key columns, which does help, but we have no direct control of how it plans its queries, so it is sometimes necessary to be creative with how the queries are written to achieve the desired plan.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

cbruno10 commented 1 year ago

Hey @leonjza , I believe the workaround of using order by @Subhajit97 provided in https://github.com/turbot/steampipe-plugin-net/issues/61#issuecomment-1593318970 should work for you. Unfortunately, the FDW sometimes requires tricks, especially when working with required key columns.

If you're still seeing issues though, please let us know!