influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.88k stars 3.55k forks source link

Wire up drop series statement #1422

Closed pauldix closed 9 years ago

corylanou commented 9 years ago

After looking into this issue, I have several questions that I could use some feedback on.

Let us start with the example data for insert that we will use in the following discussions:

{  
   "database":"foo",
   "retentionPolicy":"bar",
   "points":[  
      {  
         "name":"cpu",
         "tags":{  
            "host":"server01"
         },
         "timestamp":"2015-01-26T22:01:11.703Z",
         "values":{  
            "value":".954"
         }
      },
      {  
         "name":"cpu",
         "tags":{  
            "host":"server02"
         },
         "timestamp":"2015-01-26T22:01:11.704Z",
         "values":{  
            "value":".854"
         }
      }
   ]
}

This will create two series, effectively:

host.server01.cpu and host.server02.cpu, both of which we assign internal ids in the system for.

If I query the current system, I get:

> show series
name    tags    host
----    ----    ----
cpu             server01
cpu             server02

What I suggest we get is:

> show series                                                                                                                                                                                                                                                                 
id  name host                                                                                                                                                                                                                                                                 
--- ---- ----                                                                                                                                                                                                                                                                 
1   cpu  server01                                                                                                                                                                                                                                                             
2   cpu  server02

This would then allow us to do:

DROP SERIES 1

As well as issue a delete like this:

DELETE SERIES cpu WHERE host=server01
pauldix commented 9 years ago

@corylanou all of that is correct. The current show series output is wrong, not sure why it looks like that. Can you update it?

dgnorton commented 9 years ago

I think the current syntax supported by the parser is wrong...

DROP SERIES <series name>

Are these the only two we need?...

DROP SERIES <id>

DROP SERIES FROM <measurement> WHERE <expr>

In the latter of the two, both the from-clause and where-clause would be optional?

corylanou commented 9 years ago

In general, DROP doesn't usually take a where clause, so I'm torn.

My preference is both DROP without a qualifier and DELETE with a qualifier. Not sure if that makes it easier or harder to support in our system.

dgnorton commented 9 years ago

DELETE removes data from something but doesn't completely get rid of the thing. In MySQL, DROP TABLE doesn't allow a where-clause but their typical use-case doesn't need to support 100s-of-thousands of tables. It's practical in MySQL to say DROP TABLE tbl1, tbl2, ... , tbln but it wouldn't be practical in Influx to list out thousands of series IDs (or names, if we were to support series names) so I think it makes sense to differ from typical SQL syntax and allow a where-clause on our DROP SERIES statement.

corylanou commented 9 years ago

@pauldix do you have a preference on this? Want to just get consensus before making the changes. I agree that we have a fairly different scenario here, and considering we really are "drop"ing an entity, that DROP with an expression makes sense.

@dgnorton also, just to clarify, should we only support one way of it? Or do you still propose we support both as you outlined above:

DROP SERIES <id>

DROP SERIES FROM <measurement> WHERE <expr>
dgnorton commented 9 years ago

I favor supporting both DROP SERIES syntaxes and using DELETE only when deleting data from a series.

otoolep commented 9 years ago

The commands suggested by @dgnorton make sense to me.

pauldix commented 9 years ago

@corylanou what you have in your last comment is correct. Support dropping a specific series id or dropping all series from a measurement matching a where clause.

Further the the dimensions that show up in the where clause should only be tag keys. You shouldn't be able to have field names or time in the where clause of a drop series.

For DELETE you'll be able to have fields and time in the where clause. Also, delete won't remove anything from the metastore.