Closed phil-rice-HCL closed 1 year ago
hi @phil-rice-HCL!
servername | schema
looks like an application insights specific mapping of the underlying opentelemetry data.
can you try running with the opentelemetry javaagent and enable the logging span exporter, e.g. -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -Dotel.logs.exporter=none
?
this should log the underlying jdbc spans that are captured by the opentelemetry java agent which should help confirm whether there's anything wrong with the underlying opentelemetry jdbc span data, or if this is an issue with the application insights mapping
Firstly thanks for the quick response
I thought I would include the whole VM params for JBOSS incase there is something wrong with this. It is rather complex!
"-Dprogram.name=JBossTools: Red Hat JBoss EAP 7.2" -server -Xms512m -Xmx1512m -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman,org.jboss.logmanager -Djava.awt.headless=true "-Dorg.jboss.boot.log.file=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/log/boot.log" "-Dlogging.configuration=file:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/configuration/logging.properties" "-Djboss.home.dir=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2" -Dorg.jboss.logmanager.nocolor=true -Djboss.bind.address.management=localhost -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xbootclasspath/p:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.1.5.Final-redhat-00001.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/slf4j/impl/main/slf4j-jboss-logmanager-1.0.3.GA-redhat-2.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/wildfly/common/main/wildfly-common-1.4.0.Final-redhat-1.jar -javaagent:../azure/applicationinsights-agent-3.4.16.jar -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -Dotel.logs.exporter=none
The majority of database accesses look like these. I get many 'very similar' spans. 09:31:27,944 INFO [stdout] (Timer-5) 2023-09-01 09:31:27.943+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : f5d8b1e10231129aabb7fdafcdcee89e 1f6143292c644901 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=311, thread.name=Timer-5, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8} alues=8}
This is obviously activemq, but is going to the same database so I give this as a sample 09:34:06,897 INFO [stdout] (Thread-5 (ActiveMQ-client-global-threads)) 2023-09-01 09:34:06.897+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : 520b742866abab83ceb050d1a6e592b1 0efb2c20760dd276 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=370, applicationinsights.internal.operation_name=constitution/constitutionMDB process, thread.name=Thread-5 (ActiveMQ-client-global-threads), db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=9}
From Non nucleus database access I get this. This is the kind of thing I was expecting to see from the above two spans, 09:35:29,740 INFO [stdout] (Periodic Recovery) 2023-09-01 09:35:29.740+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'SELECT SYS.DBA_PENDING_TRANSACTIONS' : 561ef70eeefff23e3baae86ea0ed7206 2f056cac858a4561 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=frgnbora91.example.com, net.peer.port=1521, db.statement=SELECT formatid, globalid, branchid FROM SYS.DBA_PENDING_TRANSACTIONS, db.system=oracle, db.operation=SELECT, thread.id=145, db.sql.table=SYS.DBA_PENDING_TRANSACTIONS, thread.name=Periodic Recovery, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=11}
From that it looks to me as though it is the underlying open telemetry.
I've not used data nucleus before (I am on this job for profiling/speed up) but if needed I can write a sample application over the weekend and try and duplicate it. It's a little hard: working with this stack is like saying ' I need a banana. But the banana is held by a gorilla, which is leaning against a tree, which is in the forest' so to give you the banana I need to make a gorilla, a tree and a forest. And make sure that they are all the correct versions, configured in the correct (or in this case probably incorrect) way
If there are any more diagnostics we can do first that would be great! That last one was really helpful I think
regards
As the db.statement
attribute is missing my guess is that our sql statement sanitizer fails to parse the query, hence it is not able to figure out what database operation is executed or what table is being used. Try setting -Dotel.instrumentation.jdbc.statement-sanitizer.enabled=false
with this all the db span names will revert to just database name, but the spans will have db.statement
attribute that shows the query that was executed. Seeing the query could help us improve the sql parser.
I have included the server config: "-Dprogram.name=JBossTools: Red Hat JBoss EAP 7.2" -server -Xms512m -Xmx1512m -Dorg.jboss.resolver.warning=true -Djava.net.preferIPv4Stack=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Djboss.modules.system.pkgs=org.jboss.byteman,org.jboss.logmanager -Djava.awt.headless=true "-Dorg.jboss.boot.log.file=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/log/boot.log" "-Dlogging.configuration=file:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/standalone/configuration/logging.properties" "-Djboss.home.dir=/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2" -Dorg.jboss.logmanager.nocolor=true -Djboss.bind.address.management=localhost -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Xbootclasspath/p:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.1.5.Final-redhat-00001.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/slf4j/impl/main/slf4j-jboss-logmanager-1.0.3.GA-redhat-2.jar:/git/jboss-eap-dev-8080-7.2.2/product/jboss-developer/7.2.2/jboss-eap-dev-8080-7.2.2/modules/system/layers/base/org/wildfly/common/main/wildfly-common-1.4.0.Final-redhat-1.jar -javaagent:../azure/applicationinsights-agent-3.4.16.jar -Dotel.traces.exporter=logging -Dotel.metrics.exporter=none -Dotel.logs.exporter=none -Dotel.instrumentation.jdbc.statement-sanitizer.enabled=false
Initially I was quite excited: 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) 2023-09-01 11:28:54.146+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'SELECT ngdev.all_objects' : 4aea363c2d38a3c58efe9d6c7b4a9f6f 716e217f4392243f CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.user=nglobal, db.statement=SELECT NULL AS table_dog, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) o.owner AS table_schem, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) o.object_name AS table_name, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) o.object_type AS table_type, 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) NULL AS remarks 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) FROM all_objects o 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) WHERE o.owner LIKE :1 ESCAPE '/' 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) AND o.object_name LIKE :2 ESCAPE '/' 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) AND o.object_type IN ('TABLE', 'SYNONYM', 'VIEW') 11:28:54,147 INFO [stdout] (Thread-39 (ActiveMQ-client-global-threads)) ORDER BY table_type, table_schem, table_name
But unfortunately most of the traces look like this
11:30:55,069 INFO [stdout] (Timer-4) 2023-09-01 11:30:55.068+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : b9b3b9696b2d0741bc3e4792f44a4a8f 2367202d989b3b4b CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=326, thread.name=Timer-4, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8}
11:32:26,887 INFO [stdout] (Periodic Recovery) 2023-09-01 11:32:26.886+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : 864c872eee7bd0143d07f0316f479809 012aee873701f35f CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.statement=BEGIN sys.dbms_xa.dist_txn_sync 11:32:26,887 INFO [stdout] (Periodic Recovery) ; END;, db.system=oracle, thread.id=145, thread.name=Periodic Recovery, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=9}
11:30:55,183 INFO [stdout] (Timer-4) 2023-09-01 11:30:55.183+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : cadea27de1060056b00c4847f8fc20dc 5677d7edac5e9f54 CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=326, thread.name=Timer-4, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8}
That sounded like a very good theory but probably not it. Shame: I really like it when the banana story can be reduced to a unit test 'did this sql parse'.
Periodic Recovery
thread has BEGIN sys.dbms_xa.dist_txn_sync ; END;,
that we can't parse. Sql statement is still missing for queries on Timer-4
, I guess could try turning on debug logging for jdbc driver and see whether that reveals what is executed there. Here perhaps more important than what is executed is how it is executed to understand why the statement is missing from the span.
I'm not quite sure what you are asking me to do. Is it a setting that you can tell me or an exploratory thing for me?
One option I think is to tell data nucleus to say 'what is the sql you are sending'. I'll go and investigate that. I suspect I won't get back to you until Monday with that data. Unless there is a simple thing you can suggest
Unless there is a simple thing you can suggest
not necessarily simple, but creating a github repository with a minimal repro that we can run ourselves is very helpful
It is proving challenging to get the sql queries. I have posted on the DataNucleus gitter for help, and raised a stackoverflow question https://stackoverflow.com/questions/77036286/how-do-i-enable-sql-logging-using-datanucleus-with-jboss-7-2.
I will look into making that 'simple repo'. It is a significant investment (I suspect two or three days).It occurred to me that really we should make it so that the SQL is still sent even if the SQL Parser throws an exception. Can you link me to places in the open telemetry source code where this is done? It might be better to do a root cause fix than generate the repo.
To fix this we need to either be able to reproduce it or you'll have to describe the issue in sufficient detail for us to fix, basically you tell us what we are doing wrong so we could fix it. Reproducing the issue isn't always too hard. If this issue affects all the data nucleus queries then there it might be reproducible with one of the data nucleus samples. It is possible that this issue doesn't have anything to do with data nucleus. Our instrumentation works at jdbc level, it might be worth it to check whether the sql statement is added if you try to execute a prepared statement. Perhaps we don't instrument correctly the jdbc driver that you are using. You could also see if you can extract something useful from the debug log of the jdbc driver https://docs.oracle.com/en/database/oracle/oracle-database/12.2/jjdbc/JDBC-diagnosability.html
The source code for the jdbc driver instrumentation is in https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/jdbc/javaagent If you wish to debug it yourself then I'd start by checking whether in https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/db7eb14384bc3d91ebdeaf7c90b1bb0cdb15ba03/instrumentation/jdbc/library/src/main/java/io/opentelemetry/instrumentation/jdbc/internal/DbRequest.java#L27 JdbcData.preparedStatement.get(statement)
return null and if it does then I'd check whether https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/db7eb14384bc3d91ebdeaf7c90b1bb0cdb15ba03/instrumentation/jdbc/javaagent/src/main/java/io/opentelemetry/javaagent/instrumentation/jdbc/ConnectionInstrumentation.java#L51 gets called.
Totally agreement: you can't fix it unless we can duplicate it
What I'm doing: I'm doing both things: Trying to get the sql (stackoverflow) and we're trying to create the 'simple version' as well. I think it will take a few days for the latter
What I'm considering In addition I'll follow that link and see if I can see a way to resolve it no matter what that sql is. That was exactly the link I wanted
Thanks again for all the help!
Here we go. A lot of experimentation later and this.
Note that the
2023-09-05 00:17:13,153 FINE [DataNucleus.Datastore.Native] (Timer-3) SELECT 'com.example.domaine.facturation.constitution.ConstitutionJournal' AS DN_TYPE,A0.ACTIF,A0.CALCUL_SYNTHESES,A0.CONTRACTANT_FACTURE,A0.CONTRACTANT_NDS,A0.DATE_DEBUT,A0.DATE_DEMANDE,A0.DATE_EFFET,A0.DATE_FIN,A0.DATE_RECHERCHE,A0.ENTETE_CLIENT,A0.ERROR_MESSAGE,A0.ID_CONSTITUTION_JOURNAL,A0.LIEN_VALORISATION,A0.NUM_COMMANDE,A0.PERIODICITE,A0.PREFACTURE_CLIENT,A0.PRESTA_DIV,A0.SENS,A0.STATUT,A0.TYPE_PIECE,A0.TYPE_VALORISATION,A0.LOGIN,A0.JDO_VERSION,A0.WARNING_MESSAGE FROM CONSTITUTION_JOURNAL A0 WHERE A0.CONTRACTANT_NDS = <'SIE92100-0004'> AND A0.ENTETE_CLIENT = <'DIV78700-0016'> AND A0.CONTRACTANT_FACTURE = <'SIE92100-0004'> AND A0.NUM_COMMANDE IS NULL AND A0.TYPE_PIECE = <100> AND A0.TYPE_VALORISATION = <2> AND A0.STATUT = 1 AND A0.ACTIF = 1
2023-09-05 00:17:13,177 INFO [stdout] (Timer-3) 2023-09-05 00:17:13.176+02:00 INFO i.o.e.logging.LoggingSpanExporter - 'ngdev' : 91a612a0a90862cca40ed9819a0bf005 ec64e6d2dcea4f3b CLIENT [tracer: io.opentelemetry.jdbc:1.29.0-alpha] AttributesMap{data={db.connection_string=oracle:thin://db.example.com:1521, net.peer.name=db.example.com, net.peer.port=1521, db.system=oracle, thread.id=331, thread.name=Timer-3, db.name=ngdev, applicationinsights.internal.item_count=1}, capacity=128, totalAddedValues=8}
I tried this statement and our sql sanitizer handles it fine. Since the db.statement
attribute wasn't added to the span even when you disabled the sanitizer I'd guess that the problem is that the instrumentation does not know the content of the statement. Typically we attach the sql statement to the PreparedStatement
object when Connection.prepareStatement
is called and read it when the statement is executed and we create the span.
What can I do to help? At the moment it isn't working. What do we need to do to make it work?
It did look like a prepared statement to me. Utter guesswork at the moment, but that's what the <> were I suspect.
I'm joining the slack channel to see if I can get help 'getting up to speed' on the code base to help debug it.
You said above
Typically we attach the sql statement to the PreparedStatement object
when Connection.prepareStatement is called and read it when the
statement is executed and we create the span.
Could you link me the place in the code that does these two things?
@phil-rice-HCL The links are in https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/9359#issuecomment-1704946310
We have managed to simplify the project and duplicate the defect in a simple program https://github.com/deedeecodes/HelloDataNucleus.git
There is a readme, and you don't need an IDE, but you do need an azure application insight connection string. However I suspect it would work with normal open telemetry if that works better for you.
The readme basically says
At this point here the data in application insights shows the defect.
--
Is this enough for you to look at the defect? Or is there anything else I can do
@phil-rice-HCL thanks, this is enough. You can work around the issue with OTEL_JAVAAGENT_EXCLUDE_CLASSES=org.datanucleus.store.rdbms.ParamLoggingPreparedStatement
(https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/advanced-configuration-options.md#excluding-specific-classes-from-being-instrumented)
Fantastic. It works in the demo project. I will check with the real project this afternoon and get back to you tomorrow.
Sadly this didn't work. I can't see how to reopen this issue, so I raised another. If that was the wrong process: my appologies.
Describe the bug
When I use the azure applications insight agent to instrument my jboss application I can see database spans in the gui. However the sql is always the same. It is 'servername | schema' (where server name is the dns name and schema is the database schema)
As you know the azure applications insight agent just wraps the open telemetry code. The version I am using was only released very recently (see below)
I have tens of thousands of these spans and they are all the same. No sql. Juse 'servername|schema'
Versions:
Steps to reproduce
The code for this is horribly complex (200 man years) and I don't have the skills to simplify it. In essence it's a java object that uses data nucleus to communicate with the database. The version numbers are above
I'm really after 'what do I do to debug this'.
Expected behavior
My expected behavious is that the Sql will appear in the azure application insights gui
Actual behavior
The actual behavior is that every span has the sql command 'servername|schema name'
Javaagent or library instrumentation version
v1.29.0
Environment
Additional context
No response