open-telemetry / opentelemetry-ruby-contrib

Contrib Packages for the OpenTelemetry Ruby API and SDK implementation.
https://opentelemetry.io
Apache License 2.0
80 stars 166 forks source link

PG instrumentation does not parse quoted table names in queries into the 'db.collection.name' attribute #1173

Open mostfunkyduck opened 1 week ago

mostfunkyduck commented 1 week ago

Description of the bug The PG instrumentation library is using a regex here to extract the table name from queries in order to set the db.collection.name trace attribute. This regex does not match quoted table names, for instance SELECT * from "foo" will not return a match.

Share details about your runtime

Operating system details: Linux, Ubuntu 24.04 RUBY_ENGINE: "ruby" RUBY_VERSION: "3.3.4" RUBY_DESCRIPTION: "ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]"

This also happens on various production instances, I can get that information if needed.

Share a simplified reproduction if possible

I've been reproducing this by adding this to the tests for pg on a local fork

diff --git a/instrumentation/pg/test/opentelemetry/instrumentation/pg/instrumentation_test.rb b/instrumentation/pg/test/opentelemetry/instrumentation/pg
/instrumentation_test.rb
index cc50728e..7c1c2590 100644
--- a/instrumentation/pg/test/opentelemetry/instrumentation/pg/instrumentation_test.rb
+++ b/instrumentation/pg/test/opentelemetry/instrumentation/pg/instrumentation_test.rb
@@ -275,6 +275,13 @@ describe OpenTelemetry::Instrumentation::PG::Instrumentation do
       client.query('DROP TABLE test_table') # Drop table to avoid conflicts
     end

+    it 'extracts table name with quotes' do
+      client.query('CREATE TABLE "test_quotes_table" (personid int, name VARCHAR(50))')
+
+      _(span.attributes['db.collection.name']).must_equal '"test_quotes_table"'
+      client.query('DROP TABLE test_quotes_table') # Drop table to avoid conflicts
+    end
+
     describe 'when db_statement is obfuscate' do
       let(:config) { { db_statement: :obfuscate } }

This seems to fix it, though I'm pretty green at ruby and the inner workings of this library, so I don't know if there are knock-on effects here.

diff --git a/instrumentation/pg/lib/opentelemetry/instrumentation/pg/patches/connection.rb b/instrumentation/pg/lib/opentelemetry/instrumentation/pg/pat
ches/connection.rb
index 98814950..6330abee 100644
--- a/instrumentation/pg/lib/opentelemetry/instrumentation/pg/patches/connection.rb
+++ b/instrumentation/pg/lib/opentelemetry/instrumentation/pg/patches/connection.rb
@@ -15,7 +15,7 @@ module OpenTelemetry
         # Module to prepend to PG::Connection for instrumentation
         module Connection # rubocop:disable Metrics/ModuleLength
           # Capture the first word (including letters, digits, underscores, & '.', ) that follows common table commands
-          TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)
\s+([\w\.]+)/i
+          TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)
\s+(["\w\.]+)/i

           PG::Constants::EXEC_ISH_METHODS.each do |method|
             define_method method do |*args, &block|
hannahramadan commented 1 week ago

Hi @mostfunkyduck! Thank you submitting this issue and sharing what you did to get things working properly. Would you be interested in making a contribution?

For testing, the file instrumentation/pg/test/fixtures/sql_table_name.json feeds each entry into this test. I think we could update the fixtures file to keep things consistent. Perhaps adding something like:

    {
      "name": "table_name_with_single_quotes",
      "sql": "SELECT columns FROM 'test_table'"
    },
    {
      "name": "table_name_with_double_quotes",
      "sql": "SELECT columns FROM \"test_table\""
    }

For the updated regex, is it desirable to include the quotes in the end result? Your suggestion would take care of this (although could be updated to recognize single quotes too).

If we don't want to include the quotes, and to account for both single and double quotes, we could do:

TABLE_NAME = /\b(?:FROM|INTO|UPDATE|CREATE\s+TABLE(?:\s+IF\s+NOT\s+EXISTS)?|DROP\s+TABLE(?:\s+IF\s+EXISTS)?|ALTER\s+TABLE(?:\s+IF\s+EXISTS)?)\s+["']?([\w\.]+)["']?/i

Would love to hear your thoughts ◡̈

mostfunkyduck commented 6 days ago

I can take a shot at it! Just need to go through the contributor's guide.

We probably should not include the quotes for consistency's sake - two queries on the same table should produce the same attribute regardless of whether quotes are used - so I'll keep them outside the capture.

As far as single quotes, this SO post quotes the postgres manual as saying that single quotes are only for string constants, so they'd be a syntax error if they were wrapping the table name.

hannahramadan commented 6 days ago

@mostfunkyduck - excited to see your contribution come through 🎉 I agree on keeping the quotes out and its also great to learn about single quotes v double quotes in this context, thank you for sharing that.