theodi / big-data-publishing

(Draft) Guidance, sample code (potentially), etc related to publishing big data on the Web
5 stars 2 forks source link

Add a note about consistency #7

Closed Floppy closed 11 years ago

pikesley commented 11 years ago

Seems consistent

tomheath commented 11 years ago

@Floppy re "consistency of data across shards; immediate or eventual?" can you clarify what you mean? I envisage a deterministic sharding algorithm across static data dumps, rather than sharding in an online system where immediate/eventual consistency might be an issue. That make sense?

Floppy commented 11 years ago

Imagine Twitter. The twitter firehose doesn't look the same to everyone. Twitter updates aren't synchronous to everyone, some people will get them before others, but they get them eventually (still quick, but not synchronous).

If you're publishing high-frequency data, if there are multiple servers for that data, they might not be up-to-the-second consistent, but they would be eventually consistent. If that's OK, you can have a simpler system.

Might be worth including as a consideration.

tomheath commented 11 years ago

Yeah, good point for non-dump data (though obviously "freshness" metadata applies to those too). I'll add this under the streaming section and revisit latest thinking about etags, publishing commit levels etc. (I.e. best ways to indicate the state of an endpoint).

tomheath commented 11 years ago

integrated comments