turbot / steampipe-postgres-fdw

The Steampipe foreign data wrapper (FDW) is a zero-ETL product that provides Postgres foreign tables which translate queries into API calls to cloud services and APIs. It's bundled with Steampipe and also available as a set of standalone extensions for use in your own Postgres database.
https://steampipe.io/
Apache License 2.0
71 stars 17 forks source link

Unexpected EOF / segfault #349

Open judell opened 2 years ago

judell commented 2 years ago

These are seemingly identical.

with repos(full_name) as (
  values 
    ('turbot/steampipe-plugin-github')
) select * from repos

+--------------------------------+
| full_name                      |
+--------------------------------+
| turbot/steampipe-plugin-github |
+--------------------------------+
with repos as (
  select 
    full_name
  from
    github_search_repository
  where
    query = 'turbot/steampipe-plugin-github in:name'
) select * from repos

+--------------------------------+
| full_name                      |
+--------------------------------+
| turbot/steampipe-plugin-github |
+--------------------------------+

But one of these is not like the other. At least the crash produces a spew of entrails that hopefully will help.


with repos(full_name) as (
  values 
    ('turbot/steampipe-plugin-github')
)
select
  c.sha
from
  repos r
join
  github_search_commit c 
on 
  r.full_name = c.repository_full_name
where
  query = 'committer-date:2022-08-01..2022-09-01 repo:' || r.full_name

+------------------------------------------+
| sha                                      |
+------------------------------------------+
| 3c98a5bdaa01e1a5f8d7ea3756ec768651031115 |
| 23164d2a69225fbf86a72b02e388b0d7bd96c563 |
| 1f7731f3efdecb6c15b8004ce2e48372f9d6d836 |
+------------------------------------------+
with repos as (
  select 
    full_name
  from
    github_search_repository
  where
    query = 'turbot/steampipe-plugin-github in:name'
)
select
  c.sha
from
  repos r
join
  github_search_commit c 
on 
  r.full_name = c.repository_full_name
where
  query = 'committer-date:2022-08-01..2022-09-01 repo:' || r.full_name

Error: unexpected EOF

Logs: https://gist.github.com/judell/60a13d7f848ba08adc2dd1c6b8748bc8

judell commented 2 years ago

@kaidaguerre this might be an interesting clue?

create table repos as (
  select
    full_name
  from
    github_search_repository g
  where 
    query = 'turbot/steampipe-plugin-github in:name'
)
select 
  r.full_name,
  g.sha
from 
  repos r
join
  github_search_commit g
on 
  r.full_name = g.repository_full_name
where
  g.query = 'committer-date:2022-08-01..2022-09-01 repo:turbot/steampipe-plugin-github'

+------------------------------------------+
| sha                                      |
+------------------------------------------+
| 3c98a5bdaa01e1a5f8d7ea3756ec768651031115 |
| 23164d2a69225fbf86a72b02e388b0d7bd96c563 |
| 1f7731f3efdecb6c15b8004ce2e48372f9d6d836 |
+------------------------------------------+
select 
  r.full_name,
  g.sha
from 
  repos r
join
  github_search_commit g
on 
  r.full_name = g.repository_full_name
where
  g.query = 'committer-date:2022-08-01..2022-09-01 repo:' || r.full_name

Error: unexpected EOF
judell commented 2 years ago

Workaround: build the query string in a prior CTE.

with repos as (
  select 
    full_name,
    'committer-date:2022-08-01..2022-09-01 repo:' || full_name as query
  from
    github_search_repository
  where
    query = 'turbot/steampipe-plugin-github in:name'
  order by
    full_name
)
select
  c.sha
from
  repos r
join
  github_search_commit c 
on
  r.full_name = c.repository_full_name
where
  c.query =  r.query
judell commented 1 year ago

@kaidaguerre Here is a variation. In this example the happy path looks like:

with prompts as (
  with names as (
  )
 select
    'Q: ' || name || ' A:' as prompt
from
  names
)
...

The sad path:

with names as (
),
prompts as (
   select
       'Q: ' || name || ' A:' as prompt
     from
       names
)
...

If you look at the output of the prompts CTE it's the same in both cases, but doing them in sequence is a fail, the workaround is to produce the prompts using a CTE that embeds another CTE, vs a chain of 2 CTEs.

happy path

      with prompts as (
        with names as (
          select 
            unnest ( array [ 
              'FedRAMP', 
              'FFIEC', 
              'SOC 2'
              ] ) as name
        )
        select
          name,
          'Q: ' || name || ' A:' as prompt
        from
          names
      )
      select
        p.name,
        p.prompt,
        o.completion
      from 
        prompts p
      join
        openai_completion o
      on
        o.prompt = p.prompt

sad path

      with names as (
        select 
          unnest ( array [ 
            'FedRAMP', 
            'FFIEC', 
            'GDPR', 
            'SOC 2'
            ] ) as name
      ),
      prompts as (
        select
          name,
          'Q: ' || name || ' A:' as prompt
        from
          names
      )
      select
        p.name,
        p.prompt,
        o.completion
      from 
        prompts p
      join
        openai_completion o
      on
        o.prompt = p.prompt
judell commented 1 year ago

One more example, @kaidaguerre. A common feature of these examples: the input to the join is a value constructed by string concatenation. Seems like I always wind up doing a nested-CTE workaround in such cases in order to avoid the EOF fail.

      with data as (
        with feed_link as (  -- this extra cte level should not be necessary
          select 'https://' || ( select name from mastodon_server ) || '/tags/' || $1 || '.rss' as feed_link
        )
        select feed_link from feed_link
      ),
      feed as (
          select
            (regexp_match(link, '(.+)/\d+$'))[1] as account_url,
            jsonb_array_elements_text(categories) as tag
          from
            rss_item r
          join
            data d
          using (feed_link)
          limit $2
      )
      select distinct on (account_url, tag)
        jsonb_build_object(
          'account_url', account_url,
          'tag', tag,
          'note', case
            when account_url is not null then (select note from mastodon_search_account where query = account_url)
            else ''
            end
        ) as account_url_tag_note
      from
        feed
github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.