influxdata / influxdb-python

Python client for InfluxDB
MIT License
1.69k stars 521 forks source link

Duplicated tags when using InfluxDBClient #680

Open Afrojam opened 5 years ago

Afrojam commented 5 years ago

Hello, because the error about Already tz-aware, use tz_convert to convert. when using DataFrameClient, I have changed my code to use InfluxDBClient, what I have found is that tags yet duplicated.

Code Sample, a copy-pastable example if possible

# Your code here
conn1 = InfluxDBClient(server, port, database=database)
query = 'Select * from %s' % city
rs = conn1.query(query)
points = rs.get_points()
df = pd.DataFrame(points)
df.columns

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

The results from above query are: Index(['Day', 'DayOfWeek', 'DayOfWeek_1', 'Day_1', 'Hour', 'Hour_1', 'Month', 'Month_1', 'Season', 'Season_1', 'Year', 'Year_1', 'latitude', 'latitude_1', 'longitude', 'longitude_1', 'occupancy_length_0_5_min', 'occupancy_length_0_5_min_1', 'occupancy_length_10_30_min', 'occupancy_length_10_30_min_1', 'occupancy_length_1_2_hour', 'occupancy_length_1_2_hour_1', 'occupancy_length_1_7_day', 'occupancy_length_1_7_day_1', 'occupancy_length_4_8_hour'...

The columns got duplicated, and those columns are not in the influx measurement (the _1 ones). Also, it is strange because the columns with the suffix _1 have "good" values while the ones without _1 all values are None. Another thing to notice is that this happens when I'm using the whole measurement data, while when using only data from past 1 day or 2 days this does not happens.

Output

  occupancy_length_4_8_hour_1 occupancy_length_5_10_min
  0                               2                      None
  1                               1                      None
  2                               4                      None
  3                               4                      None
  4                               0                      None
  5                               2                      None

Expected Output

Index(['Day', 'DayOfWeek', 'Hour', 'Month', 'Season', 'Year', 'latitude', 'longitude', 'occupancy_length_0_5_min', 'occupancy_length_10_30_min', 'occupancy_length_1_2_hour', 'occupancy_length_1_7_day', 'occupancy_length_4_8_hour',

Output of influxdb.version``

'5.2.1'

Showing my tag keys in influx

tagKey ------ Day DayOfWeek Hour Month Season Year latitude longitude occupancy_length_0_5_min occupancy_length_10_30_min occupancy_length_1_2_hour occupancy_length_1_7_day occupancy_length_4_8_hour occupancy_length_5_10_min occupancy_length_8_24_hour
clslgrnc commented 5 years ago

At least two issues might exlpain this behavior:

You can try SHOW FIELD KEYS and SHOW TAG KEYS to investigate.

If there is no bug in your code, can you build a minimal working example starting from a fresh database exhibiting the behavior?

Afrojam commented 5 years ago

You are right, I have noticed that some "columns" appear on both as a FIELD and as a TAG, and this is what causes to have the duplicates. I'm going to drop the series and create them again, taking care of that.

Afrojam commented 5 years ago

Well, I'm not getting if I have a bug when transfering data into influxdb. I call write_points() from InfluxDBClient() using a list where each item is like: {'fields': {'occupancy_lengths_square_sum': x, 'occupancy_lengths_sum': x, 'std_occupancy_length_min': x, 'total_disconnected_hour': x, 'total_free_events': x, 'total_free_hour': x, 'total_full_events': x, 'total_full_hour': x, 'total_occupancy_events': x, 'total_occupancy_hour': x}, 'measurement': 'city', 'tags': {'Day': x, 'DayOfWeek': x, 'Hour': x, 'Month': x, 'Season': x, 'Year': x, 'latitude': x, 'longitude': x, 'occupancy_length_0_5_min': x, 'occupancy_length_10_30_min': x, 'occupancy_length_1_2_hour': x, 'occupancy_length_1_7_day': x, 'occupancy_length_4_8_hour': x, 'occupancy_length_5_10_min': x, 'occupancy_length_8_24_hour': x, 'sector_id': 'x', 'sector_name': 'x', 'sectortype_id': 'x', 'sectortype_name': 'x', 'total_occupancy_intervals': x}, 'time': datetime.datetime(xxxx, x, x, x, x)} 'Written 40080 points into influxdb'

So I don't get why I got then all the data in influx as fieldKey: From show field keys

name: city fieldKey fieldType


Day string DayOfWeek string Hour string Month string Season string Year string latitude string longitude string occupancy_length_0_5_min string occupancy_length_10_30_min string occupancy_length_1_2_hour string occupancy_length_1_7_day string occupancy_length_4_8_hour string occupancy_length_5_10_min string occupancy_length_8_24_hour string occupancy_lengths_square_sum float occupancy_lengths_sum float sector_id string sector_name string sectortype_id string sectortype_name string std_occupancy_length_min float total_disconnected_hour float total_free_events float total_free_hour float total_full_events float total_full_hour float total_occupancy_events float total_occupancy_hour float total_occupancy_intervals string

From show tag keys

name: city tagKey

Day DayOfWeek Hour Month Season Year latitude longitude occupancy_length_0_5_min occupancy_length_10_30_min occupancy_length_1_2_hour occupancy_length_1_7_day occupancy_length_4_8_hour occupancy_length_5_10_min occupancy_length_8_24_hour sector_id sector_name sectortype_id sectortype_name total_occupancy_intervals

clslgrnc commented 5 years ago

Tags as fields might have been inserted with your DataFrameClient code. Try to reproduce from a fresh db with only one field and one tag.

Afrojam commented 5 years ago

@clslgrnc not using DataFrameClient here, using the InfluxDBClient and converting the points from the result set to a DataFrame. I have tried the following: Same data base, new measurement, 1 tag and 1 field. {'measurement': 'test', 'tags': {'Hour': 19}, 'time': datetime.datetime(2018, 8, 31, 19, 0), 'fields': {'total_occupancy_hour': 0}}

This worked perfect: name: test fieldKey fieldType


total_occupancy_hour integer

name: test tagKey

Hour

Also, I have tried in a new database the original insert (so only change is the database)

show field keys name: fp_kingdom_tower_historic fieldKey fieldType


occupancy_lengths_square_sum float occupancy_lengths_sum float std_occupancy_length_min float total_disconnected_hour float total_free_events float total_free_hour float total_full_events float total_full_hour float total_occupancy_events float total_occupancy_hour float show tag keys name: fp_kingdom_tower_historic tagKey

Day DayOfWeek Hour Month Season Year latitude longitude occupancy_length_0_5_min occupancy_length_10_30_min occupancy_length_1_2_hour occupancy_length_1_7_day occupancy_length_4_8_hour occupancy_length_5_10_min occupancy_length_8_24_hour sector_id sector_name sectortype_id sectortype_name total_occupancy_intervals

This worked PERFECT

Finally, using another measurement name like "city2", I have no problems with the tags and the fields...

This is strange, because in the original database I have other "cities" (measurements) and I don't have the strange behaviour that I have reported. Indeed, this behaviour only happens in the last city that I have created in the data base. The measurement is created when inserting the data. Any idea?

Afrojam commented 5 years ago

I have tried the following approach: 1) Drop the conflictive measurement DROP MEASUREMENT city 2) Insert all the data into a new measurement called placeholder. Here we can see that all is fine: name: placeholder fieldKey fieldType


occupancy_lengths_square_sum float occupancy_lengths_sum float std_occupancy_length_min float total_disconnected_hour float total_free_events float total_free_hour float total_full_events float total_full_hour float total_occupancy_events float total_occupancy_hour float

3) Select into city from placeholder group by But the result is not fine: name: city fieldKey fieldType


Day string DayOfWeek string Hour string Month string Season string Year string latitude string longitude string occupancy_length_0_5_min string occupancy_length_10_30_min string occupancy_length_1_2_hour string occupancy_length_1_7_day string occupancy_length_4_8_hour string occupancy_length_5_10_min string occupancy_length_8_24_hour string occupancy_lengths_square_sum float occupancy_lengths_sum float sector_id string sector_name string sectortype_id string sectortype_name string std_occupancy_length_min float total_disconnected_hour float total_free_events float total_free_hour float total_full_events float total_full_hour float total_occupancy_events float total_occupancy_hour float total_occupancy_intervals string

So, the problem is the measurement name and some type of cache...