open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.94k stars 846 forks source link

Cannot see actual SQL in Azure Applications Insight while using Data Nucleus and JDBC #9359

Closed phil-rice-HCL closed 1 year ago

phil-rice-HCL commented 1 year ago

Describe the bug

When I use the azure applications insight agent to instrument my jboss application I can see database spans in the gui. However the sql is always the same. It is 'servername | schema' (where server name is the dns name and schema is the database schema)

As you know the azure applications insight agent just wraps the open telemetry code. The version I am using was only released very recently (see below)

I have tens of thousands of these spans and they are all the same. No sql. Juse 'servername|schema'

Versions:

Steps to reproduce

The code for this is horribly complex (200 man years) and I don't have the skills to simplify it. In essence it's a java object that uses data nucleus to communicate with the database. The version numbers are above

I'm really after 'what do I do to debug this'.

Expected behavior

My expected behavious is that the Sql will appear in the azure application insights gui

Actual behavior

The actual behavior is that every span has the sql command 'servername|schema name'

Javaagent or library instrumentation version

v1.29.0

Environment

Additional context

No response

trask commented 1 year ago

hi @phil-rice-HCL!

servername | schema looks like an application insights specific mapping of the underlying opentelemetry data.

can you try running with the opentelemetry javaagent and enable the logging span exporter, e.g. -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -Dotel.logs.exporter=none?

this should log the underlying jdbc spans that are captured by the opentelemetry java agent which should help confirm whether there's anything wrong with the underlying opentelemetry jdbc span data, or if this is an issue with the application insights mapping

phil-rice-HCL commented 1 year ago

Firstly thanks for the quick response

I thought I would include the whole VM params for JBOSS incase there is something wrong with this. It is rather complex!

"-Dprogram.name=JBossTools: Red Hat JBoss EAP 7.2" -server -Xms512m -Xmx1512m -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman,org.jboss.logmanager -Djava.awt.headless=true "-Dorg.jboss.boot.log.file=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/log/boot.log" "-Dlogging.configuration=file:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/configuration/logging.properties" "-Djboss.home.dir=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2" -Dorg.jboss.logmanager.nocolor=true -Djboss.bind.address.management=localhost -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xbootclasspath/p:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.1.5.Final-redhat-00001.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/slf4j/impl/main/slf4j-jboss-logmanager-1.0.3.GA-redhat-2.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/wildfly/common/main/wildfly-common-1.4.0.Final-redhat-1.jar -javaagent:../azure/applicationinsights-agent-3.4.16.jar -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -Dotel.logs.exporter=none

The majority of database accesses look like these. I get many 'very similar' spans. 09:31:27,944 INFO [stdout] (Timer-5) 2023-09-01 09:31:27.943+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : f5d8b1e10231129aabb7fdafcdcee89e 1f6143292c644901 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=311, thread.name=Timer-5, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8} alues=8}

This is obviously activemq, but is going to the same database so I give this as a sample 09:34:06,897 INFO [stdout] (Thread-5 (ActiveMQ-client-global-threads)) 2023-09-01 09:34:06.897+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : 520b742866abab83ceb050d1a6e592b1 0efb2c20760dd276 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=370, applicationinsights.internal.operation_name=constitution/constitutionMDB process, thread.name=Thread-5 (ActiveMQ-client-global-threads), db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=9}

From Non nucleus database access I get this. This is the kind of thing I was expecting to see from the above two spans, 09:35:29,740 INFO [stdout] (Periodic Recovery) 2023-09-01 09:35:29.740+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'SELECT SYS.DBA_PENDING_TRANSACTIONS' : 561ef70eeefff23e3baae86ea0ed7206 2f056cac858a4561 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=frgnbora91.example.com, net.peer.port=1521, db.statement=SELECT formatid, globalid, branchid FROM SYS.DBA_PENDING_TRANSACTIONS, db.system=oracle, db.operation=SELECT, thread.id=145, db.sql.table=SYS.DBA_PENDING_TRANSACTIONS, thread.name=Periodic Recovery, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=11}

phil-rice-HCL commented 1 year ago

From that it looks to me as though it is the underlying open telemetry.

I've not used data nucleus before (I am on this job for profiling/speed up) but if needed I can write a sample application over the weekend and try and duplicate it. It's a little hard: working with this stack is like saying ' I need a banana. But the banana is held by a gorilla, which is leaning against a tree, which is in the forest' so to give you the banana I need to make a gorilla, a tree and a forest. And make sure that they are all the correct versions, configured in the correct (or in this case probably incorrect) way

If there are any more diagnostics we can do first that would be great! That last one was really helpful I think

regards

laurit commented 1 year ago

As the db.statement attribute is missing my guess is that our sql statement sanitizer fails to parse the query, hence it is not able to figure out what database operation is executed or what table is being used. Try setting -Dotel.instrumentation.jdbc.statement-sanitizer.enabled=false with this all the db span names will revert to just database name, but the spans will have db.statement attribute that shows the query that was executed. Seeing the query could help us improve the sql parser.

phil-rice-HCL commented 1 year ago

I have included the server config: "-Dprogram.name=JBossTools: Red Hat JBoss EAP 7.2" -server -Xms512m -Xmx1512m -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman,org.jboss.logmanager -Djava.awt.headless=true "-Dorg.jboss.boot.log.file=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/log/boot.log" "-Dlogging.configuration=file:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/configuration/logging.properties" "-Djboss.home.dir=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2" -Dorg.jboss.logmanager.nocolor=true -Djboss.bind.address.management=localhost -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xbootclasspath/p:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.1.5.Final-redhat-00001.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/slf4j/impl/main/slf4j-jboss-logmanager-1.0.3.GA-redhat-2.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/wildfly/common/main/wildfly-common-1.4.0.Final-redhat-1.jar -javaagent:../azure/applicationinsights-agent-3.4.16.jar -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -Dotel.logs.exporter=none -Dotel.instrumentation.jdbc.statement-sanitizer.enabled=false

Initially I was quite excited: 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) 2023-09-01 11:28:54.146+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'SELECT ngdev.all_objects' : 4aea363c2d38a3c58efe9d6c7b4a9f6f 716e217f4392243f CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.user=nglobal, db.statement=SELECT NULL AS table_dog, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) o.owner AS table_schem, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) o.object_name AS table_name, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) o.object_type AS table_type, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) NULL AS remarks 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) FROM all_objects o 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) WHERE o.owner LIKE :1 ESCAPE '/' 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) AND o.object_name LIKE :2 ESCAPE '/' 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) AND o.object_type IN ('TABLE', 'SYNONYM', 'VIEW') 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) ORDER BY table_type, table_schem, table_name

But unfortunately most of the traces look like this

11:30:55,069 INFO [stdout] (Timer-4) 2023-09-01 11:30:55.068+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : b9b3b9696b2d0741bc3e4792f44a4a8f 2367202d989b3b4b CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=326, thread.name=Timer-4, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8}

11:32:26,887 INFO [stdout] (Periodic Recovery) 2023-09-01 11:32:26.886+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : 864c872eee7bd0143d07f0316f479809 012aee873701f35f CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.statement=BEGIN sys.dbms_xa.dist_txn_sync 11:32:26,887 INFO [stdout] (Periodic Recovery) ; END;, db.system=oracle, thread.id=145, thread.name=Periodic Recovery, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=9}

11:30:55,183 INFO [stdout] (Timer-4) 2023-09-01 11:30:55.183+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : cadea27de1060056b00c4847f8fc20dc 5677d7edac5e9f54 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=326, thread.name=Timer-4, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8}

phil-rice-HCL commented 1 year ago

That sounded like a very good theory but probably not it. Shame: I really like it when the banana story can be reduced to a unit test 'did this sql parse'.

laurit commented 1 year ago

Periodic Recovery thread has BEGIN sys.dbms_xa.dist_txn_sync ; END;, that we can't parse. Sql statement is still missing for queries on Timer-4, I guess could try turning on debug logging for jdbc driver and see whether that reveals what is executed there. Here perhaps more important than what is executed is how it is executed to understand why the statement is missing from the span.

phil-rice-HCL commented 1 year ago

I'm not quite sure what you are asking me to do. Is it a setting that you can tell me or an exploratory thing for me?

One option I think is to tell data nucleus to say 'what is the sql you are sending'. I'll go and investigate that. I suspect I won't get back to you until Monday with that data. Unless there is a simple thing you can suggest

trask commented 1 year ago

Unless there is a simple thing you can suggest

not necessarily simple, but creating a github repository with a minimal repro that we can run ourselves is very helpful

phil-rice-HCL commented 1 year ago

It is proving challenging to get the sql queries. I have posted on the DataNucleus gitter for help, and raised a stackoverflow question https://stackoverflow.com/questions/77036286/how-do-i-enable-sql-logging-using-datanucleus-with-jboss-7-2.

I will look into making that 'simple repo'. It is a significant investment (I suspect two or three days).It occurred to me that really we should make it so that the SQL is still sent even if the SQL Parser throws an exception. Can you link me to places in the open telemetry source code where this is done? It might be better to do a root cause fix than generate the repo.

laurit commented 1 year ago

To fix this we need to either be able to reproduce it or you'll have to describe the issue in sufficient detail for us to fix, basically you tell us what we are doing wrong so we could fix it. Reproducing the issue isn't always too hard. If this issue affects all the data nucleus queries then there it might be reproducible with one of the data nucleus samples. It is possible that this issue doesn't have anything to do with data nucleus. Our instrumentation works at jdbc level, it might be worth it to check whether the sql statement is added if you try to execute a prepared statement. Perhaps we don't instrument correctly the jdbc driver that you are using. You could also see if you can extract something useful from the debug log of the jdbc driver https://docs.oracle.com/en/database/oracle/oracle-database/12.2/jjdbc/JDBC-diagnosability.html The source code for the jdbc driver instrumentation is in https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/jdbc/javaagent If you wish to debug it yourself then I'd start by checking whether in https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/db7eb14384bc3d91ebdeaf7c90b1bb0cdb15ba03/instrumentation/jdbc/library/src/main/java/io/opentelemetry/instrumentation/jdbc/internal/DbRequest.java#L27 JdbcData.preparedStatement.get(statement) return null and if it does then I'd check whether https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/db7eb14384bc3d91ebdeaf7c90b1bb0cdb15ba03/instrumentation/jdbc/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/jdbc/ConnectionInstrumentation.java#L51 gets called.

phil-rice-HCL commented 1 year ago

Totally agreement: you can't fix it unless we can duplicate it

What I'm doing: I'm doing both things: Trying to get the sql (stackoverflow) and we're trying to create the 'simple version' as well. I think it will take a few days for the latter

What I'm considering In addition I'll follow that link and see if I can see a way to resolve it no matter what that sql is. That was exactly the link I wanted

Thanks again for all the help!

phil-rice-HCL commented 1 year ago

Here we go. A lot of experimentation later and this.

Note that the are (I suspect) parameterised.i.e. ?s in the PreparedStatement . Apart from the <> I can't see much 'odd' about it.

2023-09-05 00:17:13,153 FINE [DataNucleus.Datastore.Native] (Timer-3) SELECT 'com.example.domaine.facturation.constitution.ConstitutionJournal' AS DN_TYPE,A0.ACTIF,A0.CALCUL_SYNTHESES,A0.CONTRACTANT_FACTURE,A0.CONTRACTANT_NDS,A0.DATE_DEBUT,A0.DATE_DEMANDE,A0.DATE_EFFET,A0.DATE_FIN,A0.DATE_RECHERCHE,A0.ENTETE_CLIENT,A0.ERROR_MESSAGE,A0.ID_CONSTITUTION_JOURNAL,A0.LIEN_VALORISATION,A0.NUM_COMMANDE,A0.PERIODICITE,A0.PREFACTURE_CLIENT,A0.PRESTA_DIV,A0.SENS,A0.STATUT,A0.TYPE_PIECE,A0.TYPE_VALORISATION,A0.LOGIN,A0.JDO_VERSION,A0.WARNING_MESSAGE FROM CONSTITUTION_JOURNAL A0 WHERE A0.CONTRACTANT_NDS = <'SIE92100-0004'> AND A0.ENTETE_CLIENT = <'DIV78700-0016'> AND A0.CONTRACTANT_FACTURE = <'SIE92100-0004'> AND A0.NUM_COMMANDE IS NULL AND A0.TYPE_PIECE = <100> AND A0.TYPE_VALORISATION = <2> AND A0.STATUT = 1 AND A0.ACTIF = 1

2023-09-05 00:17:13,177 INFO [stdout] (Timer-3) 2023-09-05 00:17:13.176+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : 91a612a0a90862cca40ed9819a0bf005 ec64e6d2dcea4f3b CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=331, thread.name=Timer-3, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8}

laurit commented 1 year ago

I tried this statement and our sql sanitizer handles it fine. Since the db.statement attribute wasn't added to the span even when you disabled the sanitizer I'd guess that the problem is that the instrumentation does not know the content of the statement. Typically we attach the sql statement to the PreparedStatement object when Connection.prepareStatement is called and read it when the statement is executed and we create the span.

phil-rice-HCL commented 1 year ago

What can I do to help? At the moment it isn't working. What do we need to do to make it work?

It did look like a prepared statement to me. Utter guesswork at the moment, but that's what the <> were I suspect.

I'm joining the slack channel to see if I can get help 'getting up to speed' on the code base to help debug it.

phil-rice-HCL commented 1 year ago

You said above

Typically we attach the sql statement to the PreparedStatement object
when Connection.prepareStatement is called and read it when the 
statement is executed and we create the span.

Could you link me the place in the code that does these two things?

laurit commented 1 year ago

@phil-rice-HCL The links are in https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/9359#issuecomment-1704946310

phil-rice-HCL commented 1 year ago

We have managed to simplify the project and duplicate the defect in a simple program https://github.com/deedeecodes/HelloDataNucleus.git

There is a readme, and you don't need an IDE, but you do need an azure application insight connection string. However I suspect it would work with normal open telemetry if that works better for you.

The readme basically says

At this point here the data in application insights shows the defect.

--

Is this enough for you to look at the defect? Or is there anything else I can do

laurit commented 1 year ago

@phil-rice-HCL thanks, this is enough. You can work around the issue with OTEL_JAVAAGENT_EXCLUDE_CLASSES=org.datanucleus.store.rdbms.ParamLoggingPreparedStatement (https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/advanced-configuration-options.md#excluding-specific-classes-from-being-instrumented)

phil-rice-HCL commented 1 year ago

Fantastic. It works in the demo project. I will check with the real project this afternoon and get back to you tomorrow.

phil-rice-HCL commented 1 year ago

Sadly this didn't work. I can't see how to reopen this issue, so I raised another. If that was the wrong process: my appologies.