sourcenetwork / defradb

DefraDB is a Peer-to-Peer Edge Database. It's the core data storage system for the Source Network Ecosystem, built with IPFS/IPLD, LibP2P, CRDTs, and Semantic web3 properties.
120 stars 32 forks source link

[Hackathon]: Explore N dimensional, circle-bufferable, array-based datatype #569

Open AndrewSisley opened 2 years ago

AndrewSisley commented 2 years ago

Inspired by mention of SunSpec stuff, this task should only be worked on in free time, and mostly for fun. Try not to waste other peoples' time with this (for now at least). Task partly created so I dont forget about this thought. I am also highly likely to hit a number of blockers that we plan on having anyway in defra (which should not be rushed in as part of this).

It could be super fun to have defra support N dimensional array data types, even more fun if they can be accessed via human readable keys (e.g. time), even more fun if they can act as N length buffers, even more fun if the buffers can grow/skrink auto-magically based on certain conditions (e.g. shrink post-sync), and if they can support aggregates (potentially eagerly evaluated).

Look at supporting medium/long-term storage, and real-time transmission buffers.

Might need to have a more serious look at pointproof stuff etc, as it could be very interesting to try and do this efficiently whilst correctly maintaining correct versioning guarantees (would be obscenely expensive to track a commit per item update).

orpheuslummis commented 2 years ago

It's unlikely that I would work on this in the coming 2 months, but I would enjoy and recommend keeping track of progress and clarifications directly on the issue. Very cool

orpheuslummis commented 2 years ago

Related to https://github.com/sourcenetwork/defradb/issues/100

AndrewSisley commented 2 years ago

Really surprised and encouraged by the interest show in this idea, particularly from the more business-orientated people, thank you all!

Is a fairly simple/bare-bones (rust) array-based (local) database implementation here if anyone is interested and doesn't mind the shameful self-promotion. Concept is quite simple and is naturally quite performant. Tricky part will very likely be the commit related stuff (and making sure that doesn't trash the performance).

Also making a note that DJ knows someone with a very fun sounding background in this area, and it would probably be worth trying to check in with them if/when this idea starts to get serious.

orpheuslummis commented 2 years ago

Machine learning!

AndrewSisley commented 1 year ago

Late night thought - circle buffers could be implemented pretty easily and still fit within the gql spec - would just be an inline array type with a @circle-buffer directive (or similar), internal start-index would be hidden, and the array would be returned as a normal array when queried (from start index to start index-1).

Map-like types could perhaps be defined as arrays of a key-value-pair struct, and could accept input parameters (e.g. WindSpeed5m(start: 10:00, end: 17:00)) or return the entire collection. Could also be made to work with the @circle-buffer directive.

Could also have an @ordered directive, sorting items on insert.

And a @unique directive perhaps (might be enough to keep this implicit when using key-value struct?).

These directives could then be combined to create circle-buffered maps of key-ordered items that can be efficiently queried by range. E.g.

type turbine {
    Name: String
    Windspeed10minLast24Hrs: [TimeFloat] @ordered @circle-buffer(size: 144)  // 144 = 1 day
}
query {
   turbine {
       Name
       // Take the last hour only, last keyword (and similar) could be added feature for all arrays
       lastHour: Windspeed10minLast24Hrs(last: 6)
       // Take the values for the interval for which the turbine speed was throttled
       batCurtailment: Windspeed10minLast24Hrs(start: 19:00,  end: 21:00)
       // Calculate the average windspeed during the curtailment period
       batCurtailmentAvg: _avg(Windspeed10minLast24Hrs: {start: 19:00,  end: 21:00}), 
   }
}