Data Loading and Management

tmeasday commented 9 years ago

Article outline

https://github.com/meteor/guide/blob/master/outlines/data-loading.md

Major decision points

Data loading is best done through publications
All subscriptions should happen inside UI components. Even "global" subscriptions should be done in the app layout component. Data loaded from a subscription should be accessed in the same component, and passed down through arguments, rather than relying on global data to be available in Minimongo
There's a strategy for pagination here, we should investigate what works well in production apps
Client-only data should be in a Tracker-enabled store, for example a ReactiveDict wrapped in an API
Relational data should be published using publish-composite
External data should be pushed to the client through publications - for example, you can poll a REST endpoint through a pub

Old outline:

Proposed outline

Loading and publishing data from Mongo on the server.
Subscribing to data on the client
- For now, just in the straightforward way, emphasize autorun re-sub behaviour and workarounds
Client only data (Stores) vs persistent server data (Collections)
Modifying data ("actions"? -- store mutators or methods)
Complex publications:
- Relational data - use publish-composite to publish relational data.
- limiting data to what you need
- reusing publications vs limiting them.
- pagination patterns
Publishing data from 3rd party sources
- Poll-publish pattern
Publications as RESTful endpoints
Open Questions
- Should webhooks be part of the methods article? I think so
- Do we encourage people to pass queries / options into subscriptions? I think no.

tmeasday commented 9 years ago

@justinsb This would possibly be a chapter you might want to weigh in on. I'll ping you again when it's more fleshed out.

tmeasday commented 9 years ago

See also #11

mitar commented 9 years ago

So for me the most important thing I tell new people who start with Meteor is a cycle of data propagation you have to keep in mind:

data is in the database
you define publish endpoint to publish it
you subscribe to it from the client, data is pushed to the client
you push that data to the template
you declare how the data should be render in the template
you have an event handler
which calls some method on the server which modifies the data
and data change is pushed around and everything (because it is declaratively defined) updates automatically

So I think it is really important that people understand that they should not be changing data or templates directly on the client but should go through the server and leave to the loop to make everything happen.

What about where should publish functions go? This is for me still unclear. Should it be separate from views? Or together with views (so in same directory, where view for me is close to feature)? Because some publish functions are shared between views and some are not. Same for methods. Same you need for a particular view and some are generic.

stubailo commented 9 years ago

Subscribing to data on the client

I think we should take a look at @arunoda's subs-manager and see if there is some low-hanging fruit we could suggest there.

Should webhooks be part of the methods article? I think so

Technically, webhooks will be calling a method, but conceptually they are about data loading. I guess the pattern is, use the webhook to insert into Mongo?

Do we encourage people to pass queries / options into subscriptions? I think no.

So is a subscription for one document where you pass the single document not a good idea?

tmeasday commented 9 years ago

Subscribing to data on the client

I think we should take a look at @arunoda's subs-manager and see if there is some low-hanging fruit we could suggest there.

Subs manager is good for what it is but I think decided we aren't comfortable recommending that technique because of the scope for bugs given Meteor's current globalness. I could reconsider that.

Should webhooks be part of the methods article? I think so

Technically, webhooks will be calling a method, but conceptually they are about data loading. I guess the pattern is, use the webhook to insert into Mongo?

I thought webhooks are about data modification? So the forms chapter would make sense. The only issue is that it's about "forms" rather than "methods" right now. But I think that's still OK

Do we encourage people to pass queries / options into subscriptions? I think no.

So is a subscription for one document where you pass the single document not a good idea?

I think an _id is fine, just not an arbitrary selector. You wrote this in the security chapter anyway

tmeasday commented 9 years ago

I think what @mitar is saying about a sort of "flow diagram" of how data moves around is hugely useful. My only question is which article does this "fluxy" diagram go in? This one or the methods one?

arunoda commented 9 years ago

@tmeasday could you tell more about the issue with subsManager? I don't get the reason? It's a cache where you can control how it behave.

It gives significant performance and UX improvements.

tmeasday commented 9 years ago

@arunoda the issue that always concerns me is bugs which are hard to replicate.

The subs-manager pattern introduces a second layer of state in the app which is "where I was a little while ago". All of a sudden the data that's in your local cache is no longer determined but just where you are now, but also where you were for the lifetime of the subs manager cache.

If people were always super careful in their find calls to just select the documents and fields that they subscribed to, it wouldn't be a problem, but they aren't (not-withstanding heroic attempts by people like @sachag to promote patterns to ensure it).

It's true that the above is the real problem, and things like page->page transitions suffer the same issue (two sets of subscriptions open at once and rendered to the screen separately). But the difference there is that it's much more obvious what the issue is when something goes wrong. In the subs manager world, it's easy to imagine scenarios where people have bugs reported that they can't replicate (because the true replication is "first go to page A, then go to B and do a bunch of stuff").

If you could do something like .subscribe(..).getDataset() (which @stubailo and I discussed at length but decided was too much of a departure for this version of the guide), then I'd be comfortable (getDataset could be promise-y and return straight away or not depending on caching).

Am I being overlay pedantic here? Maybe! But I'm worried about recommending patterns that I personally avoid..

Oh, and btw, I'm not sure it's fair to say it gives significant performance improvements. I can imagine both cases where it would help performance (not repeatedly re-opening the same pub) and hinder performance (leaving unnecessary and expensive publications open for extended periods of time).

arunoda commented 9 years ago

Okay I get it. I'm pretty okay with it's not in the here. Just my idea. May be we need to define different areas in the Meteor guide. Which tools suites in which place and so on. Anyway, eventually users will findout SubsManager.

Performance Gains

It gives huge performance boost. That's due to a lot of practical scenarios. About the performance gains subsManager gives you in two ways.

1) Low Latency - with the use of the cache 2) Low CPU Usage - I'll talk more about this below.

This reduce subRate of the app a lot. It's safe to assume users browse the same page(areas) a lot time in a single session. So, that reduce the all the re-subscribing and CPU costs goes to network activities (and transport related code in Meteor)

Our tests shows, most of the apps have subscriptions with very low lifetime. And changes in those subscriptions are very little. (compared with the time it's open). And Meteor reuses observers. Out tests shows many of the apps have over 50% obeserver reuse ratio.

So keeping the subscription open is not an issue

And we don't ask to add subsManager for every subscription. It's upto users to decide which subscriptions powered by SubsManager. We mentioned this in BulletProofMeteor and in Kadira docs.

tmeasday commented 9 years ago

Ok, it's fair to say that for a subscription that is often/usually shared it does give significant performance gains.

I think if we don't include it, this is a clear case of a package that should be mentioned in a "further reading" section of this article. I'll wait for @stubailo to weigh in again.

arunoda commented 9 years ago

@tmeasday That's sound great.

SideNote: I assume this is discussed somewhere else, it's good idea to have different sections for people with different levels of understanding. Or we can narrow the first release for some generic guidelines.

mitar commented 9 years ago

If you could do something like .subscribe(..).getDataset() (which @stubailo and I discussed at length but decided was too much of a departure for this version of the guide), then I'd be comfortable (getDataset could be promise-y and return straight away or not depending on caching).

You mean this ticket? #2247

So maybe instead of getDataset (What ugly name, BTW, Java background leaking again? dataset? Why not simply documents? Or even better .subscribe(..).find() so you can make a query against it.) we should just be able to query based on subscriptions?

tmeasday commented 9 years ago

You mean this ticket? #2247

More or less, yeah.

Java background leaking again?

Nope..

.documents() seems wrong because it implies a single collection. What we are talking about here is a subset of the data in each collection that the subscription publishes to. (.find() certainly is incorrect for this reason, unless it takes a collection name as first argument).

"X" is to database what cursor is to collection. Agree that "dataset" isn't a great word but it does seem to work.

tmeasday commented 9 years ago

Or did you mean Java background because I put get in front? If so you are paying way too close attention to my random code snippets.

mitar commented 9 years ago

Or did you mean Java background because I put get in front? If so you are paying way too close attention to my random code snippets.

:-)

mitar commented 9 years ago

"X" is to database what cursor is to collection. Agree that "dataset" isn't a great word but it does seem to work.

The question is what are operations you can do on X? So first probably select a collection, then query?

I think better API would be that you could do:

Posts.find({}, {subscription: subscription})

where subscription is the handle returned from subscribe. Now that subscriptions have id, you could just somehow query based on that. This is clean, simple to make backwards compatible, and simple to add to existing queries.

tmeasday commented 9 years ago

I'm not against that, but I have other plans around slicing up the datasets and using them as "contexts" for templates/components. You might call it Relay or something like that. (That makes me think, what does Relay/GraphQL call this concept..)

tmeasday commented 9 years ago

Anyway, it's all pretty academic because a client-side merge box is a highly non-trivial change so I don't expect we'll see it any time soon.

mitar commented 9 years ago

I'm not against that, but I have other plans around slicing up the datasets and using them as "contexts" for templates/components.

I don't know about you, but with my proposed API this is as easy as:

Template.foo.onCreated(function () {
  this.context = this.subscribe("foo");
});

Template.helpers({
  foo: function () {
    return Foo.find({}, subscription: Template.instance().context);
  }
});

Of course that behavior of using queries inside template instances context could be done even automatically, that it takes all template subscriptions as context.

Anyway, it's all pretty academic because a client-side merge box is a highly non-trivial change so I don't expect we'll see it any time soon.

You would like to limit fields based on the subscription? They yea, it is tricky. But just getting which IDs are from which subscription is probably already available somewhere internally.

tmeasday commented 9 years ago

You would like to limit fields based on the subscription? They yea, it is tricky. But just getting which IDs are from which subscription is probably already available somewhere internally.

Incorrect. You can use https://atmospherejs.com/percolate/find-from-publication to fake it, but it's a total kludge.

mitar commented 9 years ago

BTW, what you subscribe is internally called record set.

tmeasday commented 9 years ago

If we are talking proposed APIs, mine would look something like

<template name="fooController">
  {{> foo instance.dataset}}
</template>

Template.fooController.onCreated(() => {
  this.dataset = this.subscribe('foo').dataset();
});

Template.foo.helpers({
  posts: function() {
   return this.dataset.posts.find();
  }
});

Then it is trivially easy to test foo against an arbitrary dataset.

arunoda commented 9 years ago

I like it. To do this, we need to remove the mergebox from the server. Otherwise, we need to define the query alongside the publication.

On Wed, Oct 28, 2015 at 11:05 AM Tom Coleman notifications@github.com wrote:

If we are talking proposed APIs, mine would look something like

Template.fooController.onCreated(() => { this.dataset = this.subscribe('foo').dataset(); });

Template.foo.helpers({ posts: function() { return this.dataset.posts.find(); } });

Then it is trivially easy to test foo against an arbitrary dataset.

— Reply to this email directly or view it on GitHub https://github.com/meteor/guide/issues/33#issuecomment-151729399.

mitar commented 9 years ago

OK, this API is really the same as mine, only that it is prefix instead of suffix. And that it has big problems because of the reactivity. What if I publish first one collection and then after some time (after ready) I publish another. You would at least have to have this.dataset.posts().find().

tmeasday commented 9 years ago

@arunoda :+1:. This is the direction that @stubailo were talking in, something like:

Posts.all = new Subscription({
  query: () => { ... }
});

const handle = Posts.all.subscribe('foo');

Which could then totally do the dataset pattern via re-running the query client side. But then the question is how to make the publication work properly with queries over multiple collections -- do you map it to publish-composite syntax or something?

Doesn't sound completely impossible but a big chunk of concepts that we'll leave for the next iteration of the guide if we still like it. (Thus my original comment)

tmeasday commented 9 years ago

@mitar without theorycrafting about the technicalities of APIs that don't and won't exist in the forseeable future, the real difference between our versions is that yours references the Posts global, which makes testing annoying. That's the biggest benefit I'm trying to get out of this idea.

mitar commented 9 years ago

the real difference between our versions is that yours references the Posts global, which makes testing annoying. That's the biggest benefit I'm trying to get out of this idea.

The public Posts will be hard to get rid of. At least if you want to use models and transform operations defined there. Or are you saying that dataset.posts would be simply the same as Posts, just a variable, and limited to a subscription? Because you know, you can also pass collection as an argument for testing. :-)

<template name="fooController">
  {{> foo instance.subscription instance.collection}}
</template>

Template.fooController.onCreated(() => {
  this.subscription = this.subscribe('foo');
  this.collection = Posts;
});

Template.foo.helpers({
  posts: function() {
   return this.collection.find({}, {subscription: this.subscription});
  }
});

arunoda commented 9 years ago

@tmeasday Yeah! This is something beyond the guide. I think we should stop here and jump to some other place. Otherwise, we'll make this tread a mess :)

SachaG commented 9 years ago

I still think SubsManager should be included. Maybe in order to avoid debugging issues what we need is an easy way to turn the cache on/off?

tmeasday commented 9 years ago

That doesn't help when you get a report from the field of "I did X, Y, Z, saw W", when the real reproduction is "I went to A, then I did X, Y, Z, saw W". That's what I'm worried about

SachaG commented 9 years ago

I get your concern, but that's a small downside compared to the UX benefits that come from using subscription caching imo.

SachaG commented 9 years ago

Also, somewhat related here's an alternative to pub/sub I've been working on (more similar to how GraphQL works conceptually): https://github.com/SachaG/smartquery

mitar commented 9 years ago

I think what @mitar is saying about a sort of "flow diagram" of how data moves around is hugely useful. My only question is which article does this "fluxy" diagram go in? This one or the methods one?

This in fact looks much more like reflux: https://github.com/reflux/refluxjs

stubailo commented 9 years ago

My only question is which article does this "fluxy" diagram go in? This one or the methods one?

Perhaps both? If we make a diagram, we can just use it in both places, highlighting different parts.

tmeasday commented 8 years ago

Outline merged https://github.com/meteor/guide/blob/master/outlines/data-loading.md !

steph643 commented 8 years ago

I would rephrase heading c like this:

c.If it relates to individual items from an existing collection (per item checkboxes, for instance) or if you need to query it, use a local collection

And I would add:

d. Other solutions

where there could be pointers to more advanced solutions, such as reactive-state.

mitar commented 8 years ago

Huh, opaque strings for keys in the state. This is not very IDE friendly. ;-)

steph643 commented 8 years ago

@tmeasday Yeah! This is something beyond the guide. I think we should stop here and jump to some other place. Otherwise, we'll make this tread a mess :)

I asked for a public discussion on this more than half a year ago (see here and here).

tmeasday commented 8 years ago

@steph643 thanks for the link. I guess this reactive state idea is sort of angular-like, no? -- A tree of "state" (aka scope) that you use within a template. I guess what I don't like about it is that if it's going to be tree like it makes sense to scope it to the relevant branch of the template heirarchy rather than letting the template be in charge of grabbing something global itself.

mitar commented 8 years ago

I really think such solutions should be third party packages. Blaze should provide template instances, and then people can attach react-like props, Blaze Components like fields, angular like state to it. We will hardly decide which one is the best. :-)

stubailo commented 8 years ago

We will hardly decide which one is the best. :-)

Thankfully the ReactiveDict approach isn't a package - it's just one suggested pattern. If people decide that ReactiveDict doesn't fulfill their needs, it will be easy to switch to something else and have basically the same patterns.

mitar commented 8 years ago

Thankfully the ReactiveDict approach isn't a package

Yes, I am talking about some old ideas of making instance.state be by default present, automatically.

it's just one suggested pattern

Should we suggest all of them? Using ReactiveDict, ReactiveField, and reactive state? :-) That can be like a very short sections, three headings, three should examples, and then it can continue with whatever direction you want the rest of the guide to use.

mitar commented 8 years ago

I made this package which allows one to scope queries to the subscription. I decided to do a different API to the one above.

tmeasday commented 8 years ago

Interesting. A few notes in comparison to FFP:

What are the syncing issues (that FFP has) that you refer to?
I don't know if it's true that FFP really sends much more data when you consider gzip, I suspect it wouldn't make a significant difference unless your source documents are tiny.
FFP also does sorting -- I think you could support it also by just setting the scopeFieldName value to an order. This would actually IMO then be the biggest advantage of your library (because sorting via a second collection is basically impossible to do properly in Mongo).
Personally I wouldn't say that monkey patching Meteor's pub and sub code is less complicated than wrapping things, but each to his own I guess ;)

arunoda commented 8 years ago

@tmeasday what's FFP?

tmeasday commented 8 years ago

Oh, find-from-publication (the package that subscription scope is replicating with a different approach and API)

arunoda commented 8 years ago

Okay got it :)

mitar commented 8 years ago

What are the syncing issues (that FFP has) that you refer to?

That you are using two collections/subscriptions. So both have to be on the client up-to-date to be able to query based on the subscription, no? So if I do subscription.ready() and then I want to fetch documents only from that subscription, it is not necessary true that I can do that because the other subscription for which documents are in which subscription is not yet ready (or updated).

I don't know if it's true that FFP really sends much more data when you consider gzip, I suspect it wouldn't make a significant difference unless your source documents are tiny.

I have not measured things over the wire, true. So maybe this is premature optimization on my part. But, it does increase memory on the server-side because merge box stores all those documents.

FFP also does sorting -- I think you could support it also by just setting the scopeFieldName value to an order. This would actually IMO then be the biggest advantage of your library (because sorting via a second collection is basically impossible to do properly in Mongo).

Yes, I could do sorting, but I didn't want to because the server side becomes really complicated then (you have to use observe with addedBefore and stuff) to keep the sorting values up-to-date. So instead of just adding a field to whenever user calls added, now I have to intercept how they are calling added and how the sorting on the server end is changing. Also, I do not really care about the order on the server side. I think this is an anti-pattern to care in which order you send the documents. Maybe, because I am using reactivity on the server side as well, and things like publish middleware which all interfere with the order of documents being send over the wire. So if you want to do some sorting, in my view, this should be done on the client side. My package thus just provides the information which documents are from the subscription, and order is not something which is provided.

What use cases you have for an example of where the order of calling added is important to know on the client? I could see that one would want to preserve the order of the cursor with sort applied, but then the user would have to use observe with ordering, which is much more complicated then just order of added calls. Maybe I could expose an API for the user to put a custom value in the scopeFieldName. So then if they want a sorted publish, they would call observe themselves and compute the scopeFieldName value based on addedBefore and movedBofore themselves.

Personally I wouldn't say that monkey patching Meteor's pub and sub code is less complicated than wrapping things, but each to his own I guess ;)

I am not saying that it is nice, but it is simpler, less lines of code, less data to go over, simpler concept.

Meteor should provide APIs to do that properly. @rclai is now working on at least common API for something like this: https://github.com/rclai/meteor-collection-extensions

But I think that Meteor's common way is to not provide APIs until community develops package showing the need for that. But yes, please do merge this pull request in: https://github.com/meteor/meteor/pull/5845

BTW, you might be interested in this package as well: https://github.com/peerlibrary/meteor-subscription-data

tmeasday commented 8 years ago

What use cases you have for an example of where the order of calling added is important to know on the client?

I'm thinking about any time the sort order is not knowable on the client. For instance if you query a fulltext search endpoint (think ElasticSearch) to get a ranked set of documents for the publication.

tmeasday commented 8 years ago

BTW, you might be interested in this package as well: https://github.com/peerlibrary/meteor-subscription-data

Interesting stuff, thanks for showing me @mitar

meteor / guide