parse-community / parse-server

Parse Server for Node.js / Express
https://parseplatform.org
Apache License 2.0
20.92k stars 4.78k forks source link

Errors with Push Notifications on Parse Server #2275

Closed sprabs closed 8 years ago

sprabs commented 8 years ago

I have the latest version of the parse-server code, running on AWS Elastic Beanstalk with an mLab database:

I am hitting some issues with push notifications between my clients. All the credentials were set up fine for iOS and Android and it was working great. Then, after some time, when I perform some client-to-client operations (for testing) in which one client sends push notifications to another client or group of clients through a Cloud Code method, I start to see some problems.

I get this error in Android Studio: failed with error: Invalid classname: PushStatus, classnames can only have alphanumeric characters and , and must start with an alpha character while sending push notification.

Subsequently, when I send any additional push notification through an Android device, it errors out (same error message). Also, when I then try to load the data browser (hosted on parse.com), I see that it fails.

After some debugging, the issue is "resolved" (until it happens the next time) when I remove the _PushStatus document in the _SCHEMA collection. This has happened three times now in the last 24 hours...and is obviously not something that would fly in production.

Is anyone else hitting this problem? How did you fix it? I think the problem is that _PushStatus is not a supported collection that should belong in the mLab schema.

sprabs commented 8 years ago

@flovilmart Is this something you've come across/do you have any suggestions? You were super helpful last time! :)

sprabs commented 8 years ago

This seems related, too, but I don't see a resolution: https://github.com/ParsePlatform/parse-server/issues/1591

flovilmart commented 8 years ago

The _PushStatus collection should not cause any issue anymore as it should not be part of the schema. This seems to be a bug.

MBDeveloper commented 8 years ago

Working with latest version 2.2.16 - the _PushStatus is created in the Schema and break the server on parse.com side.

sprabs commented 8 years ago

@flovilmart I checked the commit you added to fix the issue in April. I noticed that in the latest code, part of that was lost. I have done a point fix on my side and am still testing. It's just a single line in src/Controllers/DatabaseController.js from: https://github.com/ParsePlatform/parse-server/pull/1636

Would incorporating that change back in solve the _PushStatus problem or does more need to be done?

flovilmart commented 8 years ago

@sprabs I'm not sure what you're referring to. If you want to open a pull request, go ahead.

sprabs commented 8 years ago

@flovilmart This is what I mean:

You changed this code https://github.com/ParsePlatform/parse-server/pull/1636 to fix the _PushStatus issue back on April 25th, correct?

If you look at the latest code on master, all your changes have persisted except the one you made in src/Controllers/DatabaseController.js. (Why did that get lost in the first place by the way?) In my local repo, I added that change back to see if that fixes the issue. Do you know if the loss of that change is why the issue has returned? (By that I mean, the fact that that one line change seems to have been blown away?)

flovilmart commented 8 years ago

The fix for the _PushStatus class was to make it so it's not stored in the _SCHEMA. Maybe that fix was accidentally reverted.

sprabs commented 8 years ago

How can we verify this?

flovilmart commented 8 years ago

check https://github.com/ParsePlatform/parse-server/blame/master/src/Controllers/DatabaseController.js also, I'm still not sure that the change was reverted.

sprabs commented 8 years ago

Look at line 98 in the current /src/Controllers/DatabaseController.js that's checked in to master.

previous line: .then(() => schemaController.getOneSchema(className, true))

was replaced with (current code): .then(schemaController => schemaController.getOneSchema(className))

When I reverted this change to the version that is claimed to have worked, I don't seem to be seeing the _PushStatus schema issue anymore. Are there any side effects for this change? Do we know why someone changed it in the first place? Was it an oversight or by design?

flovilmart commented 8 years ago

Given the blame info, @drew-gross may be able to give more info.

sprabs commented 8 years ago

@flovilmart Looks like even with my code change, the issue happened again. There's definitely a bug with src/Controllers/DatabaseController.js

sprabs commented 8 years ago

To reproduce this issue:

Send several push notifications (we've sometimes triggered it by just sending 5!) from an iOS or Android client that is pointing to a parse-server endpoint (in our case we have it hosted on an AWS Elastic Beanstalk, but that should not matter). It also doesn't matter who the recipient of the push notification is (it could be another client pointing to parse.com or to parse server).

The _PushStatus collection's schema then gets added to the mLab _SCHEMA collection. At that point, clients that are pointing to parse.com can no longer trigger push notifications (in our case, through a Cloud Code method). However, they can receive push notifications from clients that are pointing to the parse server endpoint.

The deadline is coming, and this is a pretty serious issue for the period of time when we are migrating our clients off of parse.com... Any suggestions for how to fix this? @drew-gross @hramos

sprabs commented 8 years ago

Does anyone have updates or suggestions for how to proceed? This affects everyone migrating IMO.

@drew-gross @MBDeveloper @flovilmart @hramos

flovilmart commented 8 years ago

@sprabs is this bug only affecting dashboard.parse.com?

sprabs commented 8 years ago

@flovilmart No, it's breaking any push notifications sent through clients still point to parse.com as well. That's the main issue for us. It's caused by _PushStatus schema being created. Here's the client error: [com.parse.ParseRequest$ParseRequestException: Invalid classname: PushStatus, classnames can only have alphanumeric characters and , and must start with an alpha character while sending push notification.]

So basically, we can't migrate until it's fixed because it will take a few weeks for all clients to migrate. We can't have the clients on parse-server breaking push notifications for clients on parse.com every 10-15 minutes...

flovilmart commented 8 years ago

You can use the parse-dashboard in the meantime no? If this is a really pressing issue for you I encourage you to try to put a PR together and work on a fix that would benefit everyone.

sprabs commented 8 years ago

The main issue isn't just the dashboard (as stated above). It's that it breaks push notifications for clients still using parse.com. We extensively tested this. I'll re-post the findings:

Basically, once a set of push notifications are triggered from any client or group of clients pointing to parse-server, at some point the _PushStatus schema gets added. At that point, two things happen: parse.com dashboard breaks AND any clients still using parse.com will error when attempting to send a push notification (in our case through a cloud method).

sprabs commented 8 years ago

@flovilmart :/ I've spent time looking at it, but am not sure where to start. Unfortunately, I'm not familiar enough with the parse-server code to add a fix. I was hoping you could help me or at least point me in the right direction since you've worked on this part of the code base before.

flovilmart commented 8 years ago

@sprabs I can't guarantee you that this particular fix will be done in the next days... I have other priorities of my own.

sprabs commented 8 years ago

@flovilmart I understand. Could you point me to where in the code to start? I could use a hint.

flovilmart commented 8 years ago

I'm not sure either why this would be stored in the _Schema

flovilmart commented 8 years ago

I just added another part of a test on the current master branch, and it don't seem _PushStatus gets saved part of the _Schema... https://github.com/ParsePlatform/parse-server/pull/2300

flovilmart commented 8 years ago

also, can you reproduce the error on a test server while running with VERBOSE=1 ? I'm curious if some _PushStatus object isn't leaked back from your cloud code, back to the client.

sprabs commented 8 years ago

We have VERBOSE=1 and I am checking the logs. I've just added your updated test, too.

One thing we did notice that it's very easily triggered when we do 5 or more quick actions from a single client in succession (from a device that's pointing to our own hosted server). I don't know if that's a hint.

sprabs commented 8 years ago

I am seeing a few interesting logs (based on console statements I added in the code) around the time when the failure happens:

verbose: sent push! 0 success, 0 failures getOneSchema called for _Installation with allowVolatileClasses = false getOneSchema called for _PushStatus with allowVolatileClasses = false getOneSchema called for _PushStatus with allowVolatileClasses = false info: warning: error while sending push code=101, message=Object not found. getOneSchema called for _Installation with allowVolatileClasses = true

Is there ever a situation where allowVolatileClasses should be true?

sprabs commented 8 years ago

I actually see this, too: getOneSchema called for _PushStatus with allowVolatileClasses = true

I think this is what does it.

flovilmart commented 8 years ago

I believe you point in the right direction.

sprabs commented 8 years ago

Should there ever be a situation when allowVolatileClasses = true?

I see that this is set in line 481 of DatabaseController.js

flovilmart commented 8 years ago

the problem is not that, the problem is that we let writes happening on the _SCHEMA with a volatile classes.

sprabs commented 8 years ago

That makes sense. This time around, it got triggered again. I added a stack trace and more logs to help debug further. Here's what I see...

Error (I forcibly created a stack trace): create called within DatabaseController at DatabaseController.create (/var/app/current/node_modules/parse-server/lib/Controllers/DatabaseController.js:584:11) at Object.setInitial (/var/app/current/node_modules/parse-server/lib/pushStatusHandler.js:57:21) at /var/app/current/node_modules/parse-server/lib/Controllers/PushController.js:121:27 at run (/var/app/current/node_modules/core-js/modules/es6.promise.js:89:22) at /var/app/current/node_modules/core-js/modules/es6.promise.js:102:28 at flush (/var/app/current/node_modules/core-js/modules/_microtask.js:18:9) at nextTickCallbackWith0Args (node.js:420:9) at process._tickDomainCallback (node.js:390:13) upsertSchema was called for: _Installation getOneSchema called for _PushStatus with allowVolatileClasses = true

sprabs commented 8 years ago

I see this (line 58 in parse-server/lib/pushStatusHandler.js) which looks suspicious to me:

return database.create(PUSH_STATUS_COLLECTION, object).then(function () {

58 pushStatus = {

59 objectId: objectId

60 };

61 return Promise.resolve(pushStatus);

62 });

flovilmart commented 8 years ago

getOneSchema don't write to the DB...

sprabs commented 8 years ago

I added a stack trace to see when create is called within DatabaseController because I suspected that was the issue. And that is the issue. Just need to prevent that from happening.

flovilmart commented 8 years ago

you mean prevent return database.create(PUSH_STATUS_COLLECTION, object).then(function () {??

sprabs commented 8 years ago

Okay, so... I added one fix in the getOneSchema method (I've bolded it) in SchemaController.js that seems to have solved it, but I'm not sure if it could cause other side effects:

getOneSchema(className, allowVolatileClasses = false) { if ((allowVolatileClasses && volatileClasses.indexOf(className) > -1) || className == '_PushStatus') { return Promise.resolve(this.data[className]); } return this._dbAdapter.getClass(className) .then(injectDefaultSchema) }

I've basically gone through the code and it never seems that a default schema should be injected for the _PushStatus collection, so I thought this would guarantee that.

Even with this change, though, I do see the following error in the logs: warning: error while sending push code=101, message=Object not found.

Usually, the schema creation for _PushStatus was triggered after such an error. I don't know if that root problem still needs to be corrected or if that's a transient error.

flovilmart commented 8 years ago

the point of the allowVolatileClasses and volatileClasses array is to prevent them to be saved. I'm not sure why getOneSchema would write a schema to the database though. The problem has to be somewhere else.

flovilmart commented 8 years ago

@sprabs instead of trying to hack the code together, I believe you should try to write a test that would make this fail. That would greatly help instead of shooting in the dark.

flovilmart commented 8 years ago

@sprabs See that test here: https://github.com/ParsePlatform/parse-server/pull/2300/commits/65924a2baaf75ab1f6d090c726acdf5ab35a73e2

It doesn't seem that getOneSchema has side effects called or not with allowVolatileClasses

sprabs commented 8 years ago

Hmmm... on our side, we are certain that that fixed the creation of the schema to our mLab database.

I am, however, seeing that there is still this periodic error: warning: error while sending push code=101, message=Object not found.

We are still trying to understand why that happens. The push still seems to send to the end user (so we are not sure if a retry is perhaps getting triggered on the server side). There's no obvious problem that is noticeable on the client side as the pushes are being received as expected.

Is it possible that the schema gets created on a push failure?

flovilmart commented 8 years ago

Again, this is too hackish to make it into a PR IMHO, as there is no way to test that this behaviour will last.

Also, https://github.com/ParsePlatform/parse-server/blob/master/spec/helper.js#L160

After each test we ensure that no class gets added to the _SCHEMA object in the DB,

// Other system classes will break Parse.com, so make sure that we don't save anything to _SCHEMA that will
// break it.

I'm not sure why in your particular case, this _PushStatus added to the _SCHEMA, as all tests, and behaviours lead to another conclusion.

What I suggest is that you try to make a test case that will break this behaviour once and for all.

sprabs commented 8 years ago

Our issue, though temporarily fixed through the hack, seems to correlate with this error: warning: error while sending push code=101, message=Object not found.

Any idea what triggers this? It's coming from the pushStatusHandler.js

flovilmart commented 8 years ago

This error may be triggered because your hack is breaking some functionalities...

I can't know for sure why that would break, did you run the unit tests with your hack in there?

I'm gonna close that issue as it's unlikely we're gonna find a solution at that point, feel free to open a PR when you find the root cause of your problem as you can see, we have many tests covering your issue and we are not able to reproduce it.

sprabs commented 8 years ago

@flovilmart That error is not caused by the hack. That error was there from day 1 using the existing code available on master.

flovilmart commented 8 years ago

@sprabs Like I suggested many times, can you please write a test that exemplifies the problem, and open a pull request with it? I've tried, with the PR that I opened to do the same without success. For now, you only provide excerpts from logs, no full logs, no steps to reproduce, and no guarantee to reproduce the bug(s) that you have.

You mentioned a problem on the _SCHEMA and now code=101, message=Object not found. That doesn't help in any way.

I am personally not affected by that issue, and our setup is serving more that 200 requests per seconds and delivering push many times a day.

sprabs commented 8 years ago

@flovilmart I still haven't quite figured out how to create the test for this, but I think we noticed one hint today.

We followed the stack and added a bunch of logs in the MongoStorageAdapter.js for findOneAndUpdate, upsertOneObject, updateObjectsByQuery after figuring out where the failure was happening. Through those logs as well as others we have added around the push notifications flow, we saw that a push notification is initially logged as "succeeded" and then right after that, we see that one with the same objectId goes to a "pending" state.

After that point, we see the error: warning: error while sending push code=101, message=Object not found.

We are suspecting that this schema creation is happening in MongoCollection.js (possibly through _adaptiveCollection) after it hits the error. Do you have any ideas how to best create a test to prove this?

When we hit that error, the dashboard issue happens (in which case parse.com is no longer accessible for that app and iOS/Android clients pointing to parse.com can no longer send push notifications).

sprabs commented 8 years ago

It appears to be a timing issue for push notifications with respect to status update. Here are some logs that consistently happen for errors. As you can see below, in the failure case, the status update for the push notification is happening in the wrong sequence (running -> succeeded and then pending -> running). In the normal case, it is happening from pending -> running and then running -> succeeded. Are you familiar with this issue? What is your recommendation for implementing a solution?

FAILURE CASE:

findOne className[_PushStatus] schema[{"fields":{}}] query[{"status":"running","objectId":"YvioEvH1kQ"}] update[{"status":"succeeded","updatedAt":"2016-07-21T01:45:17.400Z","numSent":0,"numFailed":0}] findOne className[_PushStatus] schema[{"fields":{}}] query[{"status":"pending","objectId":"YvioEvH1kQ"}] update[{"status":"running","updatedAt":"2016-07-21T01:45:17.400Z"}] findOne className[_PushStatus] schema[{"fields":{}}] query[{"objectId":"YvioEvH1kQ"}] update[{"errorMessage":"{\"code\":101,\"message\":\"Object not found.\"}","status":"failed","updatedAt":"2016-07-21T01:45:17.454Z"}]

NORMAL CASE:

findOne className[_PushStatus] schema[{"fields":{}}] query[{"status":"pending","objectId":"jgZ2f047IX"}] update[{"status":"running","updatedAt":"2016-07-21T01:45:17.794Z"}] findOne className[_PushStatus] schema[{"fields":{}}] query[{"status":"running","objectId":"jgZ2f047IX"}] update[{"status":"succeeded","updatedAt":"2016-07-21T01:45:17.883Z","numSent":3,"numFailed":0,"sentPerType":{"ios":1,"android":2}}]

grd888 commented 8 years ago

@sprabs @flovilmart , I'm not sure if this would give a clue as to the cause of this issue, but this code I'm using to send Push notifications always causes _PushStatus to be written in _SCHEMA:

function pushToUsernames(usernames, data) {
  var userQuery = new Parse.Query(Parse.User);
  userQuery.containedIn("username",usernames);
  userQuery.find({useMasterKey:true}).then(function(users) {
    console.log("Found " + users.length + " users");
    var pushQuery = new Parse.Query(Parse.Installation);
    pushQuery.containedIn("username", usernames);
    Parse.Push.send({
      where: pushQuery,
      data: data,
    }, {
        success: function() {
          console.log("pushToUsernames: Push sent successfully to " + usernames );
        },
        error: function(error) {
          console.log("Failed to send push : " + error.message);
        },
        useMasterKey: true
    });
  });
}
flovilmart commented 8 years ago

@sprabs are you able to test with that patch: https://github.com/ParsePlatform/parse-server/pull/2367 ?