influxdata / influxdb-csharp

A .NET library for efficiently sending points to InfluxDB 1.x
Apache License 2.0
199 stars 60 forks source link

Proposal: query api #39

Closed gambrose closed 7 years ago

gambrose commented 7 years ago

I realise that there are already many libraries out there that enable querying data out of influx. But none of them I have found enabled me to easily query data out in a type safe way. I have done some thinking and some hacking and think I have something that might be worth perusing. It's just a proposal at the moment. I have scaffolded the api to ensure it compiles and gives me the correct type safety.

I would appreciate it if you could give me some feedback on if you think this would be useful addition to the library, if I get it running, or if it is something you would not likely accept a pull request for.

Clinet Api

Query a single series

public class WaterTemperature
{
    public string location {get; set;}
    public double degrees {get; set;}
}

var db = new InfluxDb("NOAA_water_database");

var results = await db.Query<WaterTemperature>("SELECT degrees,location FROM h2o_temperature");

foreach (var (values, time) in resuts)
{
    Console.WriteLine($"{values.location} {values.degrees} {time}");
}

Each point is returned as a value tuple of values and time (as a DateTime). We are using tuple destucturing in the foreach to print out the values and time.

Values are matched with columns based on propery names. Attribues could be used to customise the name matching.

Query multible series (GROUP BY)

public class WaterQuality{
    public class Fields{
        public double index {get; set;}
    }

    public class Tags{
        public string location {get; set;}
        public string randtag {get; set;}
    }
}

var results = await db.Query<WaterTemperature.Fields, WaterTemperature.Tags>("SELECT index FROM h2o_quality GROUP BY location,randtag");

foreach (var series in resuts)
{
    foreach (var (values, time) in series.Points)
    {
        Console.WriteLine($"{series.Tags.location} {series.Tags.randtag} {values.index} {time}");
    }
}

We are passing in Value and Tag types so we know this returns multible series. Tag values are returned once for each series rather than for each point which is more efficient on the wire.

We can also flatten the results to make it more consise.

foreach (var (tags, values, time) in resuts.Flatten())
{
    Console.WriteLine($"{tags.location} {tags.randtag} {values.index} {time}");
}

Design

The basic design of the query api is to use seperate types that define the values and tags a query has.

This seams overly verbose but has benifits when building queries using a fluent api and still enabling full intellisense for the results. As we can track projections made to the values and tags types.

C# does not allow union types but by using an anonymous type in the select clause we can conbine both fields and tags into a single values type.

// Measurement type defintion used to build query projections.
public class WaterDepth : InfluxMeasurement<WaterDepth.Tags, WaterDepth.Fields>
{
    public WaterDepth() : base("h2o_feet")
    {
    }

    public class Tags
    {
        public string location { get; }
    }

    public class Fields
    {
        [InfluxKeyName("level description")]
        public string level_description { get; set; }

        public double water_level { get; set; }
    }
}

var query = WaterDepth.Select((fields, tags) => new { fields.level_description, tags.location });

foreach (var (values, time) in resuts)
{
    // Complie error as water_level not selected. 
    Console.WriteLine($"{values.location} {values.level_description} {values.water_level} {time}");
} 

The Select function takes an Expression so that we can parse the expression tree and produce the select clause. In this case we are importing the InfluxAggregations.COUNT function this is just a place holder that accepts field value types, in this case double, and returns int.

using static InfluxAggregations;

var query = WaterDepth.Select(fields => new { count = COUNT(fields.water_level) }).GroupBy(tags => new { tags.location });

foreach (var (tags, values, time) in resuts.Flatten())
{
    Console.WriteLine($"{tags.location} {values.count} {time}");
}

Group by time clause is kept sperate from the tag selection. I haven't dsesigned the Where clause yet.

var query = WaterDepth.Select(fields => new { count = COUNT(fields.water_level) }).GroupBy(TimeSpan.FromMinutes(12), tags => new { tags.location });

I tried to separate the client api which accepts strings and result types to deserialise separate from the query builder api. I would likely build the client api first and then get the more complicated query api working.

nblumhardt commented 7 years ago

Hi @gambrose!

First, awesome - would personally love to see something like this out there.

Though, I don't think this particular repository is currently going to be a good vehicle for it; we're making progress (with much thanks to you) on a 1.0 of the metrics collection/line protocol implementation, but really, resources to keep things at a production level of quality are somewhat stretched. I don't think we'd want to take focus from the current scenarios to dig into new ones just yet, or in the very near future.

But, I think it's a great candidate to stand up as a library in its own right! This project was originally "indie" and got off to a good start that way. I can't speak for InfluxData (not an employee), but getting something out there under your own steam to start off, and opening a discussion further down the track, would seem like a viable approach.

HTH!

gambrose commented 7 years ago

@nblumhardt thanks for your feedback.

I will spin up a separate repo and see if I can get something working.