influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.67k stars 3.54k forks source link

Deleted data? #3578

Closed drb-digital closed 9 years ago

drb-digital commented 9 years ago

Hi,

I am using influxdb together with grafana. This morning I viewed some data, didn't do any changes to that database (only to another one, see log file) and tried to view it again a couple of hours later - but the data was gone. I checked influxdb admin interface then, but even there I couldn't access my data (database is still there but empty). Retention policy of the database is set to INF and other databases, which contain older data, are populated with data. I used backup command then, to create a backup - my database (testkw31) is still there and the file is about 600mb. Unfortunately I didn't create a backup where the functional data is included. So it seems like the data wasn't deleted, but couldn't be accessed any longer. Do you have any idea what to try next? This is research data of a whole week and I really don't know what went (or I did) wrong. My main goal is to prevent that error in the future. Corresponding part of my logfile is attached. If you need any more information please let me know.

Thanks,

Regards Dennis

[http] 2015/08/06 10:01:23 127.0.0.1 - root [06/Aug/2015:10:01:02 +0200] GET /query?db=testkw31&p=root&q=SELECT+mean%28value%29+FROM+%22sensor_5%22+WHERE+time+%3E+1437980400s+and+time+%3C+1438354800s+GROUP+BY+time%285m%29+ORDER+BY+asc&u=root HTTP/1.1 200 8078 http://10.15.5.206:3000/dashboard/db/database-test-kw31 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0 48257c68-3c11-11e5-a595-000000000000 20.71290002s
[http] 2015/08/06 13:09:06 172.16.19.214 - root [06/Aug/2015:13:09:06 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8d965d17-3c2b-11e5-a59c-000000000000 5.841967ms
[http] 2015/08/06 13:09:06 172.16.19.214 - root [06/Aug/2015:13:09:06 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8d96edcd-3c2b-11e5-a59d-000000000000 8.35661ms
[http] 2015/08/06 13:09:06 172.16.19.214 - root [06/Aug/2015:13:09:06 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8d96fa9f-3c2b-11e5-a59e-000000000000 10.893963ms
[http] 2015/08/06 13:09:06 172.16.19.214 - root [06/Aug/2015:13:09:06 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8d972878-3c2b-11e5-a59f-000000000000 11.70595ms
[http] 2015/08/06 13:09:06 172.16.19.214 - root [06/Aug/2015:13:09:06 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8d9a6f6d-3c2b-11e5-a5a0-000000000000 3.313647ms
[http] 2015/08/06 13:09:06 172.16.19.214 - root [06/Aug/2015:13:09:06 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8d9a82b2-3c2b-11e5-a5a1-000000000000 17.58825ms
[http] 2015/08/06 13:09:09 172.16.19.214 - root [06/Aug/2015:13:09:09 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8f86afa6-3c2b-11e5-a5a2-000000000000 3.923694ms
[http] 2015/08/06 13:09:09 172.16.19.214 - root [06/Aug/2015:13:09:09 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8f89b2d5-3c2b-11e5-a5a3-000000000000 3.376609ms
[http] 2015/08/06 13:09:09 172.16.19.214 - root [06/Aug/2015:13:09:09 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 8f9851b8-3c2b-11e5-a5a4-000000000000 38.638224ms
[http] 2015/08/06 13:09:11 172.16.19.214 - root [06/Aug/2015:13:09:10 +0200] POST /write?u=root&p=root&db=test&rp=default&precision=n&consistency=one HTTP/1.1 204 0 - okhttp/2.4.0 906a9d35-3c2b-11e5-a5a5-000000000000 329.179867ms
[shard] 2015/08/06 13:09:53 flush 105 points in 0.009s
[shard] 2015/08/06 13:09:55 flush 95 points in 0.015s
[shard] 2015/08/06 13:09:57 flush 63 points in 0.006s
[shard] 2015/08/06 13:09:59 flush 20 points in 0.006s
[shard] 2015/08/06 13:10:01 flush 111 points in 0.011s
[shard] 2015/08/06 13:10:03 flush 76 points in 0.007s
[shard] 2015/08/06 13:10:05 flush 137 points in 0.007s
[shard] 2015/08/06 13:10:07 flush 59 points in 0.008s
[http] 2015/08/06 13:55:58 127.0.0.1 - root [06/Aug/2015:13:55:58 +0200] GET /query?db=testkw31&p=root&q=SELECT+mean%28value%29+FROM+%22sensor_1%22+WHERE+time+%3E+1436939100s+and+time+%3C+1436969700s+AND+axis%3D%27x%27+GROUP+BY+time%2830s%29+ORDER+BY+asc&u=root HTTP/1.1 200 40 http://10.15.5.206:3000/dashboard/db/database-test20150715 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0 19eb6120-3c32-11e5-a5a6-000000000000 2.241516ms
desa commented 9 years ago

@dennisbauer what version of the database are you using?

drb-digital commented 9 years ago

Influxdb 0.9.2.1 on Ubuntu 14.04.2 LTS x64

desa commented 9 years ago

can you run curl -G "http://<the adddress where your server is>:8086/query?db=testkw31" --data-urlencode "q=show series" and tell me what the out put is

drb-digital commented 9 years ago

the output is a JSON object, which seems to contain the data (it is far too long to copy it here, it's a 12mb putty output log - if you need it I can send it to you of course), one series e.g. is called sensor_1....but when I run curl -G 'http://localhost:8086/query' --data-urlencode "db=testkw31" --data-urlencode "q=SELECT value FROM sensor_1" instead, I am still getting no results: {"results":[{}]}

desa commented 9 years ago

@dennisbauer was running curl -G 'http://localhost:8086/query' --data-urlencode "db=testkw31" --data-urlencode "q=SELECT value FROM sensor_1" ever giving you any results?

Is sensor_1 the only measurement that you have?

drb-digital commented 9 years ago

@mjdesa There are more measurements, but I am getting the same empty result for each one. But when I run show series, I can see that there is still data for every measurement.

I can't tell you if it ever returned any results, because I never used curl before. I was using influxdb-java in my application to write data and influxdb admin interface and grafana to query data. But when I run it on another database curl -G 'http://localhost:8086/query' --data-urlencode "db=test" --data-urlencode "q=SELECT value FROM sensor_1" I am getting results. So I suppose it should work with testkw31 too.

desa commented 9 years ago

My guess is that the sensor_1 measurement was written in with "'s. Can you try

curl -G 'http://localhost:8086/query' --data-urlencode "db=testkw31" --data-urlencode 'q=SELECT value FROM "\"sensor_1\""'
drb-digital commented 9 years ago

@mjdesa No, there is no difference. Still an empty result. But data was written from the same application to multiple databases and all the others are working with `curl -G 'http://localhost:8086/query' --data-urlencode "db=test" --data-urlencode "q=SELECT value FROM sensor_1"``

desa commented 9 years ago

Weird. I'm not quite sure what the issue is.

beckettsean commented 9 years ago

@dennisbauer Let's try to narrow down the output from SHOW SERIES so it's not 12MB.

What does SHOW SERIES FROM sensor_1 WHERE axis='x' return?

drb-digital commented 9 years ago

@beckettsean It shows data, see extraxt below (still a huge amount, so I cut out most entries) or link to complete log file. But if I run SELECT value FROM sensor_1 WHERE axis='x' instaead, I am still getting no results.

https://www.dropbox.com/s/ecgjobtq44a17qa/putty_log.txt?dl=0

`user@linux-vm-l-0005:~$ curl -G "http://localhost:8086/query?db=testkw31" --data-urlencode "q=show series from sensor_1 where axis='x'" {"results":[{"series":[{"name":"sensor_1","columns":["_key","accuracy","axis","identifier","sensor_type","time_phone"],"values":[["sensor_1,accuracy=3,axis=x,identifier=IPA-Test,sensor_type=1,time_phone=1437981672694","3","x","IPA-Test","1","1437981672694"],["sensor_1,accuracy=3,axis=x,identifier=IPA-Test,sensor_type=1,time_phone=1437981694088","3","x","IPA-Test","1","1437981694088"],["sensor_1,accuracy=3,axis=x,identifier=IPA-Test,sensor_type=1,time_phone=1438349654207","3","x","IPA-Test","1","1438349654207"]]}]}]}``

drb-digital commented 9 years ago

@beckettsean Any thoughts about it? Because today it happend again to another database on the same server.

beckettsean commented 9 years ago

I have no theories as yet, but I find it very odd that you have the only reported cases of this. There's almost certainly something about the data or the schema or the server itself that's contributing to this issue.

From looking at the output from SHOW SERIES, you have a very high cardinality on the time_phone tag. It looks like you are duplicating the timestamp as a tag value for every point. Is that correct?

Can you give the results of SHOW RETENTION POLICIES? Have you restarted the server? How are you writing your points? Can you include a sample? Is there anything different about the two affected databases that doesn't apply to the unaffected databases?

drb-digital commented 9 years ago

@beckettsean What I am trying to do is logging sensor data on an Android smartwatch. So the timestamp of each point is the time when the value was logged. Points are batched then and sent via bluetooth to the smartphone. This is were timestamp_phone is added and this is the reason why a lot of points have different values in their time field but the same values in timestamp_phone.

When I run show retention policies the result is the following on each database {"results":[{"series":[{"columns":["name","duration","replicaN","default"],"values":[["default","0",1,true]]}]}]}

There is nothing different about these two databases, at least nothing I could recognize. But another fact that is weird: in the second database not all data is gone, only some series are gone. All databases were created via admin interface and data was populated via my app only. To query data I used the admin interface (for tests) and grafana to analyze the data (first nightly builds, now v2.1.1).

I am writing data using influxdb-java. My code is like the following:

BatchPoints batchPoints = BatchPoints
                        .database(dbName)
                        .retentionPolicy(DatabaseConstants.RETENTION_POLICY)
                        .build();

for (int i = 0; i < dataMaps.size(); i++) {
                DataMap map = dataMaps.get(i);

                int sensor = map.getInt(WearConstants.SENSOR);
                int accuracy = map.getInt(WearConstants.ACCURACY);
                long timestamp_wear = map.getLong(WearConstants.TIMESTAMP_WEAR);
                long timestamp_phone = intent.getLongExtra(WearConstants.TIMESTAMP_PHONE, 0);
                float[] values = map.getFloatArray(WearConstants.VALUES);

                Point point = Point
                    .measurement(DatabaseConstants.POINT_SENSOR + sensor)
                    .time(timestamp_wear, TimeUnit.MILLISECONDS)
                    .field(DatabaseConstants.POINT_SENSOR_VALUE, values[0])
                    .tag(DatabaseConstants.POINT_SENSOR_TYPE, String.valueOf(sensor))
                    .tag(DatabaseConstants.POINT_SENSOR_ACCURACY, String.valueOf(accuracy))
                    .tag(DatabaseConstants.POINT_SENSOR_TIME_PHONE, String.valueOf(timestamp_phone))
                    .tag(DatabaseConstants.POINT_SENSOR_AXIS, DatabaseConstants.POINT_SENSOR_AXIS_X).build();

               batchPoints.point(point);
}
InfluxDB influxDB = InfluxDBFactory.connect(dbUrl, dbUser, dbPassword);
influxDB.write(batchPoints);

Another thing that came to my attention: I updated to influxdb 0.9.2.1, but my admin interface is still not working. I am still getting the error from #3222 , so probably something went wrong. Therefore I tried to backup my data and restore it to a new server (where everything is working fine). There is no error while backing up data, but I am not able to restore it. Is there a possibility to restore data even if you are using default config file (there was no need to change config until now)? I also tried to create a new config file and use it to restore data but without any luck. I don't think this is an influxdb error, it seems more like I am doing something wrong. So it would be great if you could extend the documentation about backup/restore to cover all the necessary steps. Because if backup/restore would work, I could easily transfer my data to a new server and try to access it there.

desa commented 9 years ago

@dennisbauer here's some docs on backup and restore.

desa commented 9 years ago

Also have you tried updating to 0.9.3?

drb-digital commented 9 years ago

@mjdesa No, I haven't upgraded to 0.9.3 yet. I installed influxdb 0.9.2.1. on a new server and do some tests at the moment.

About the backup: I was able to backup data, but not to restore. I know how to create a config file, but probably I was doing something wrong about starting influxdb with another than default config. Could you tell me (after creating a config file and restoring data like shown in documentation) how to start influxdb with that config file?

desa commented 9 years ago

@dennisbauer

 influxd restore -config /path/to/influxdb.conf /path/to/mysnapshot
drb-digital commented 9 years ago

@mjdesa I tried that and it looks like the data has been restored, but it is still not accessible. This is why I was wondering if I am doing something wrong. Do I have to restart the server with any special command? Because a simple restart didn't help.

desa commented 9 years ago

@dennisbauer you shouldn't have to do anything special. A simple restart should do it. At this I'd recommend trying 0.9.3 to see if the problem still persists.

drb-digital commented 9 years ago

@mjdesa Ok, thanks. I'll try installing 0.9.3 on a new server. If the problem still exists, I'll reopen the case.

desa commented 9 years ago

@dennisbauer sounds good. Keep me updated.

drb-digital commented 9 years ago

@mjdesa seems working quite well now. As I said, I installed 0.9.3 on a new server and was using it for two weeks now without any difficulties.

desa commented 9 years ago

@dennisbauer glad to hear everything is working.