open-telemetry / opentelemetry-java-instrumentation

OpenTelemetry auto-instrumentation and instrumentation libraries for Java
https://opentelemetry.io
Apache License 2.0
1.87k stars 820 forks source link

JDBCCOnnectionURLParser creates ambiguous IPv6 address for shortUrl #1421

Open imavroukakis opened 3 years ago

imavroukakis commented 3 years ago

Given the following JDBC URL "jdbc:mariadb:loadbalance://[2001:0660:7401:0200:0000:0000:0edf:bdd7]:33,mdb.host/mdbdb"

shortUrl becomes

"mariadb:loadbalance://2001:0660:7401:0200:0000:0000:0edf:bdd7:33". This is not recommended as per https://tools.ietf.org/html/rfc5952#page-11 as it creates an ambiguous IPv6 address. In order to define a port with an IPv6 address in a URL-like manner it is recommended to wrap the address with square brackets as per https://tools.ietf.org/html/rfc3986 , i.e.

mariadb:loadbalance://[2001:0660:7401:0200:0000:0000:0edf:bdd7]:33

iNikem commented 3 years ago

Wait, what IS this shortUrl? I see we use it to set db.connection_string semantic attribute, "The connection string used to connect to the database". Why do we construct it in such a non-obvious way? See io.opentelemetry.javaagent.instrumentation.jdbc.JDBCConnectionUrlParser#withUrl

imavroukakis commented 3 years ago

My understanding is, and I may be way-off but bear with me, building a DBInfo object from a JDBC URL is subject to either a generic URL parser or a dedicated type parser and it gets delegated according to the protocol/db type parsed in the URL. If that's not what you meant, are you commenting on the complexity of the particular enum or whether shortUrl makes any sense at all? 😄

iNikem commented 3 years ago

We are almost on the same page. My confusion is about why do we take url, parse it and then combine it back to shortUrl? Why cannot we use the original url for db.connection_string? There should be easier way to strip potential user credentials, no?

imavroukakis commented 3 years ago

Right! I think this might be due to the non-standard way some additional properties are expressed on the URL, with comma or semicolon or question mark delimiters - and it's not just user credentials. So shortUrl is the basic combo of proto+subporto+hostname+port with all the rest of the options stripped out. Maybe @tylerbenson or @trask can provide a more coherent answer than mine.

tylerbenson commented 3 years ago

Nice catch. @imavroukakis. I think #1403 is fine. I admit when I wrote that parser, I was trying to cover as many cases as I could think of, but didn't consider this case.

imavroukakis commented 3 years ago

Thank you, I will aim to get around to trying my hand at a fix for this, day job permitting ;-)