peerlibrary / meteor-reactive-publish

Reactive publish endpoints
https://atmospherejs.com/peerlibrary/reactive-publish
BSD 3-Clause "New" or "Revised" License
124 stars 16 forks source link

Re-using observers #34

Open ramezrafla opened 6 years ago

ramezrafla commented 6 years ago

This lib is a life-saver, thanks @mitar

We are now facing production issues with low observer re-use. Upon re-running the autorun in the publish function, is the previous observer removed (if not used) and the new one takes over?

Same question with peerlibrary:subscription-data

I don't know Meteor internals well enough to get the answer from the source codes of the packages

mitar commented 6 years ago

Not sure what you mean by re-use. When computation reruns, it has to create a new query because it is generally impossible to know what parameters you changed in the query. I hope that Meteor then deduplicates them. Or are you saying that the issue is that query is closed before same new one is created? Maybe there could be some way to postopne closing to see if a new one created is the same.

One thing you can do for checking if parameters changed is to use computed field to compute parameters. Something like:

Meteor.publish('foobar', function () {
  const params = new ComputedField(() => {
    return {day_of_week: timestamp_to_day_of_week(reactive_timestamp_every_minute)};
  }, EJSON.equals);

  this.autorun(() => {
    return Collection.find(params());
  });
});

In this example, while reactive_timestamp_every_minute changes every minute, ComputedField triggers invalidation only when output really changes. So only then query is rerun.

Subscription data is not recreating anything. It uses one publish for all data.

etyp commented 6 years ago

@ramezrafla dug into this a bit over the past few days while I was trying to debug the cause of #37.

Answer is yes, assuming the cursor description is the same (selector + options have not changed) then the Meteor observe polling driver knows to reuse it just the same as it would in a plain old publication.

This package uses normal observeChanges same as a publish cursor would. See _observeChanges in the mongo driver package. It checks existing multiplexers to see if a cursor with the same cursor description is open and uses it.

mitar commented 6 years ago

I think the question is if the observe is reused if it is only used inside one computation which gets rerun. So do we close the observe and open exactly the same one again, or does Meteor (or we) do something smarter where if the observe is the same as before, we do not do anything.

On the client, if you have subscribe inside autorun, Meteor will be smart and will first rerun it, and then see after rerunning if any subscription changed and close old ones, instead of closing first all, rerunning, and then subscribing again.

If you have time, could you investigate this?

etyp commented 6 years ago

@mitar ah I gotcha. Yeah I have my local setup to check this easily. Will update here once I test.

etyp commented 6 years ago

@ramezrafla so I created a sample repo which shows that on observer re-run, even if the selector + options are identical, the observer is not re-used .

The setup is pretty simple:

import { Meteor } from 'meteor/meteor';
import { Mongo } from 'meteor/mongo';

const CollectionA = new Mongo.Collection('collectionA');
const CollectionB = new Mongo.Collection('collectionB');

const tickTock = new ReactiveVar(new Date().getTime());
Meteor.setInterval(() => {
  tickTock.set(new Date().getTime());
}, 10000);

Meteor.publish('testObserverReuse', function testObserverReuse() {
  this.autorun(() => {
    const tick = tickTock.get();
    console.log('re-running');
    CollectionA.findOne(
      { testField: 'abc' },
      { fields: { testField: 1 } }
    );
    return CollectionB.find(
      { createdAt: { $gt: tick } },
      { fields: { testField: 1 } }
    );
  });
});
// On client, subscribe
Meteor.subscribe('testObserverReuse');

Then in the PollingObserveDriver I added a line to assign a random id to each driver like:

PollingObserveDriver = function(options) {
  var self = this;
  self.__driverId = Random.id();
  // ...
}

Finally, on the actual _pollMongo call, I log the id to see if changed:

_pollMongo: function () {
  // ...
  // Normal code excluded for brevity
  self._pendingWrites = [];
  // Log the id to see if it changed
  console.log(self.__driverId);
  // ...
}

I found that despite the arguments for the findOne being the same, the driver is not reused. It creates a new one each time the autorun fires again.

We can prove that it is actually querying mongo again by connecting to mongo shell, setting db.setProfilingLevel(2) and querying db.system.profile.find({ns: 'meteor.collectionA', 'command.filter.testField': 'abc' }).count() between each successive interval when the reactive var is updated.

What's concerning to me is that I found each autorun trigger two queries to the mongo. I added a branch on the repo you can checkout which does not autorun and you can see only triggers 1 query to mongo.

@mitar does this answer the original question?

etyp commented 6 years ago

The double query was unexpected. But that combined with the fact that observers set up from within autoruns always poll explains a lot of the cpu spikiness in our app :)

Hopefully we can all work together to solve both issues. I think observer re-use would be a great bonus.

mitar commented 6 years ago

Hm, I would assume that there will be three queries: one query, and two observers. So CollectionA.findOne makes a query and then also makes an observe to detect if query has to be rerun. If it has, it reruns the autorun which makes a query and observe again. And then there is an observe for CollectionB.find.

Which queries do you observe?

Now, based on discussion above, I think there could be maybe multiple optimizations we could:

I am not sure if all those optimizations should really be done and if they would really improve things overall. They would add more complexity for sure. Which can lead to strange side effects we would not anticipated. I would maybe do the first one, but not the other two. In my personal case I do not have really many changes happening inside autorun to rerun it. It is mostly just to catch rare things like permissions changing and updating what documents an user can see in real-time.

etyp commented 6 years ago

@mitar thanks for the deep dive. I think you're right about items 2 and 3 adding too much complexity. Item 1 seems attractive here and I'd be eager to try it out. Any suggested approach for where the afterFlush should happen?

I can give this a try on a separate branch and see if it works as expected.