logstash-plugins / logstash-input-jdbc

Logstash Plugin for JDBC Inputs
Apache License 2.0
449 stars 187 forks source link

tracking_column is case sensitive #346

Open romain-chanu opened 5 years ago

romain-chanu commented 5 years ago

Enviromment

Documentation is confusing. From my understanding,tracking_column refers to a column name from a database table. For example, we could imagine a database table Users with a column named UserID.

If I specify the tracking_column to be UserID (and keeping the lowercase_column_names to the default value which is true), then Logstash will log the following error:

tracking_column not found in dataset. {:tracking_column=>"UserID"}

As mentioned in the documentation, each row in the resultset becomes a single event. Columns in the resultset are converted into fields in the event.

If my understanding is correct, Logstash convert the columns names into lowercase and it set the event fields names with the same lowercase values. In that situation, the tracking_column value should also be in lowercase.

Given a database column named UserID, then there could be two configuration possibles depending on one's business needs:

Option 1:: Event fields names should be the same as the column names (i.e. case sensitivity is respected):

lowercase_column_names => "false"
tracking_column => "UserID"

Option 2:: Event field names should be in lowercase:

lowercase_column_names => "true"
tracking_column => "userid"

Proposal/Suggestion

To avoid confusion, I think the documentation should mention that tracking_column value is case sensitive. Its value should be the column name in lowercase if lowercase_column_names is set to true. Otherwise, its value will be the same as the column name (case sensitive).

guyboertje commented 5 years ago

@romain-chanu @karenzone

From my understanding, tracking_column refers to a column name from a database table

The setting tracking_column actually refers to a field in the event. It might as well have been called field_that_provides_a_value_to_track :-) The setting to preserve case of column to field name was added long after the tracking settings and docs. We simply did not think it through to the docs. We should add a note to clarify.

romain-chanu commented 5 years ago

@guyboertje : I agree with you and totally understand your point. If we just read the documentation, tracking_column is The column whose value is to be tracked if use_column_value is set to true - hence the confusion. I figured that it was in fact the field and not the column. The documentation definitely needs to be updated to mention all the above points 😃

karenzone commented 5 years ago

@romain-chanu: What do you think about this?

The column containing the value to be tracked. Used only ifuse_column_valueis set totrue.`

@guyboertje: I welcome your input as well.

romain-chanu commented 5 years ago

@karenzone : it is about the same as the current documentation. That does not reflect what we discussed.

I would suggest something along that line:

tracking_column --> The column name whose value is to be tracked if use_column_valueis set to true. Its value is equal to the column name in lowercase if lowercase_column_names is set to true. Otherwise, its value is equal to the database column name (case sensitive).

@guyboertje I know tracking_column is following the event field name - but it might be confusing to mention event field when the parameter name is actually tracking_column. What do you think?