strongloop / loopback

LoopBack makes it easy to build modern applications that require complex integrations.
http://loopback.io
Other
13.23k stars 1.2k forks source link

PersistedModel.destroyAll has very dangerous behaviour if the filter is not passed in correctly. #3094

Closed stringbeans closed 5 years ago

stringbeans commented 7 years ago

Description of feature (or steps to reproduce if bug)

The .destroyAll method is very dangerous if the filter you pass into it is incorrect. For example...

User.destroyAll({
  where: {
    userId: 5
  }
})

Will actually destroy ALL User records because the filter passed into the function is incorrect. This seems like very dangerous behaviour and our developers have been hurt by this multiple times.

(for reference, the CORRECT filter would look like this:)

User.destroyAll({
  userId: 5
})

Expected result

.destroyAll should only be deleting MATCHING records to the filter. So in the case where the filter is erroneous and doesn't match anything, nothing should be deleted.

Actual result (if bug)

It looks like by default, if .destroyAll does not match any records it deletes everything.

raymondfeng commented 7 years ago

Maybe some validation can be done to see if the caller accidentally pass in filter instead of where directly. Most DBs will delete all records if no criteria is provided. We need to have a way to delete all records too.

stringbeans commented 7 years ago

@raymondfeng i would think there would be different behaviour between PersistedModel.destroyAll() vs PersistedModel.destroyAll({})

in the case where there's ever an object passed in you destroy all records that match that object filter. otherwise if there's an undefined object then you destroy all records

vjtc0n commented 7 years ago

@raymondfeng So maybe we have to check if the record exists or not, then destroyAll() ?

I think this is inconvenient somehow.

Don't you think we could separate into 2 situations like @stringbeans said ?

shaheero commented 7 years ago

+1

zenzjtech commented 7 years ago

Request to change API, too. The behavior is too dangerous. One of my colleagues accidentally deleted all of DB records because of his mistyping in where filter. Another dangerous case in that when the DB property somehow changed, without the necessary changing of source codes which uses destroyAll API.

zenzjtech commented 7 years ago

Another solution is creating a new deleting API with new behavior, IMO.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 7 years ago

This issue has been closed due to continued inactivity. Thank you for your understanding. If you believe this to be in error, please contact one of the code owners, listed in the CODEOWNERS file at the top-level of this repository.

shaheero commented 7 years ago

I hope this gets look at..

stringbeans commented 6 years ago

this issue just bit us in the butt again :(

TheBoroer commented 6 years ago

Damn, i just got burned by this too... sigh -_-'

Can we please get this re-opened @nitro404 ?

coco-napky commented 6 years ago

This really needs to be addressed :(

b-admike commented 6 years ago

Sorry for the inconvenience everyone; re-opening because this issue is still relevant and important. I will take a look at it next week.

averypfeiffer commented 6 years ago

+1

mrNamelesss commented 6 years ago

++

b-admike commented 6 years ago

I am not able to reproduce this bug at https://github.com/strongloop/loopback-datasource-juggler/pull/1597. Can anyone provide a test application, steps you took to reproduce this issue, or suggestions for the test case I wrote?

b-admike commented 6 years ago

@stringbeans @fluteguitar @vjtc0n @shaheero @TheBoroer what was the filter you were passing when you hit the issue? Can you check out #1597 to see if it covers your use case?

@raymondfeng i would think there would be different behaviour between PersistedModel.destroyAll() vs PersistedModel.destroyAll({})

in the case where there's ever an object passed in you destroy all records that match that object filter. otherwise if there's an undefined object then you destroy all records

Hm I think the usage is consistent in LoopBack for most, if not all, CRUD APIs AFAIK (for e.g. User.find({}) and User.find() do the same thing), but I understand that doesn't mean it shouldn't change if it's not reasonable. @raymondfeng @bajtos do we want to treat it differently for destroyAll?

bajtos commented 6 years ago

I am personally fine to change destroyAll to reject an empty where condition {}, asking users to use undefined as the where value instead.

.destroyAll should only be deleting MATCHING records to the filter. So in the case where the filter is erroneous and doesn't match anything, nothing should be deleted.

I agree this would be a great user experience. I am not sure how to implement such detection though!

Let's consider the example from the issue description:

User.destroyAll({
  where: {
    userId: 5
  }
})

How should we tell that the developer is incorrectly sending a filter-like object instead of a where-like object? I am concerned that checking for where property may back-fire on us, consider the following use cases:

Command:

Locations.destroyAll({
  where: {
    // Bikini Atoll, Marshall Islands
    lat: 11.583331,
    lng: 165.333332
  }
});

Now consider two different model definitions:

I suppose we could check the condition provided to destroyAll to see if it matches any Model properties, but this gets complicated when operators like and and or get involved (docs):

User.destroyAll({
  or: [
    {userId: 5},
    {email: 'user@example.com'},
  ]
});

What's worse, LoopBack models are not strict by default, it's possible to attach arbitrary properties to model instances. In that case it's not possible to say whether the filter/where object is trying to match a valid property not defined in the model, or whether the developer made a mistake.

raymondfeng commented 6 years ago

The method name deleteAll already hints it's a destructive operation. What we need to explore is to see if there are some patterns that are typical mistakes from user input. To give a few examples:

  1. deleteAll(undefined, ...);
  2. deleteAll({}, ...);
  3. deleteAll({userId: undefined}, ...);
  4. deleteAll({where: {userId: 5}}, ...);

Which one should we reject?

Personally, I think we could reject 1, 3, and 4.

For case 4, it's a common mistake that developers pass in where and filter incorrectly. Maybe we should always check both patterns and report errors if the received object seems to be wrong.

shimks commented 6 years ago

@bajtos This does seem tricky to get right. I personally think that for now we shouldn't worry about the where object having properties named where or some other operator and implement logic that will only delete by what the where object has explicitly specified.

That said, I can't reproduce the issue either. For reference, I've created a model instance, called destroyAll(where: {id: 1}}), and called find to get the created model instance back

bajtos commented 6 years ago

implement logic that will only delete by what the where object has explicitly specified.

Ah, this is an interesting idea. So you are saying that a where condition {where: {id: 1}} should NOT MATCH any model instances that don't have where property? That sounds like a solid solution to me, one that should be applied to find(), updateAll() and all other APIs performing model filtering. It may be a huge breaking change though, therefore we should probably hide it behind a feature flag.

For a better context, what connector/database are you using?

shimks commented 6 years ago

So you are saying that a where condition {where: {id: 1}} should NOT MATCH any model instances that don't have where property?

Yup.

I was using memory. Looking back, seems like like this is a database problem, so please disregard my inability to reproduce the problem

hacksparrow commented 6 years ago

@stringbeans @fluteguitar @mrNamelesss @averypfeiffer @coco-napky @TheBoroer @shaheero @vjtc0n

What database are you using?

hacksparrow commented 6 years ago

I have confirmed this happens for MySQL. Does not happen for memory and MongoDB. More connector may or may not be affected.

shaheero commented 6 years ago

MySQL

hacksparrow commented 6 years ago

How to reproduce.

  1. Scaffold a LB3 app using lb
  2. Create a MySQL datasource
  3. Create a model (eg: Fruit) to use the MySQL datasource
  4. Start server, POST two Fruits using the API Explorer, GET to see the Fruits
  5. Update the boot code in server.js

    boot(app, __dirname, function(err) {
    if (err) throw err;
    
    // start the server if `$ node server.js`
    if (require.main === module)
    app.start();
    const Fruit = app.models.Fruit;
    app.use((req, res, next) => {
      console.log('DESTROYING')
      Fruit.destroyAll({where: {id: 5}})
      next()
    })
    });
  6. Restart server
  7. GET the Fruits
  8. GET again to see all of them are gone

This looks like a connector issue.

@b-admike since you have assigned it to yourself, I hope you can take it from here.

b-admike commented 6 years ago

Thanks @hacksparrow I can also reproduce the issue. Like @shimks I was using the memory connector and testing juggler locally, so I wasn't seeing it. The connector tests on CI are failing for the correct reason too, and I am able to reproduce locally with the test on MySQL connector.

b-admike commented 6 years ago

implement logic that will only delete by what the where object has explicitly specified.

Ah, this is an interesting idea. So you are saying that a where condition {where: {id: 1}} should NOT MATCH any model instances that don't have where property? That sounds like a solid solution to me, one that should be applied to find(), updateAll() and all other APIs performing model filtering

Agreed.

The method name deleteAll already hints it's a destructive operation. What we need to explore is to see if there are some patterns that are typical mistakes from user input. To give a few examples:

  1. deleteAll(undefined, ...);
  2. deleteAll({}, ...);
  3. deleteAll({userId: undefined}, ...);
  4. deleteAll({where: {userId: 5}}, ...);

Which one should we reject?

Personally, I think we could reject 1, 3, and 4.

For case 4, it's a common mistake that developers pass in where and filter incorrectly. Maybe we should always check both patterns and report errors if the received object seems to be wrong.

Agreed with rejecting 3 and 4, but I think for 1, isn't undefined the value we set for the filter object if nothing is passed in (i.e. calling destroyAll(cb(err, res)))? For 2, I'm okay with not deleting anything since it's an empty object, or perhaps we should reject it too?

marioestradarosa commented 6 years ago

IMO,

I would personally not use LoopBack to replace my database administration tools, meaning that if I would like to empty a database table I would use any other means, not a REST API endpoint available to application programmers, but that's my personal opinion LB is an Integration and API endpoint framework not to replace database administration tools, so deleting all records in a table is something not usual for a client app.

I would suggest, even if it has an extra operation cost the following:

For experts that still want to DeleteAll and know what they are doing, they can pass in a valid where clause that all the records in the table will match. Usually in DBMS we use where 1=1 or where true etc , but here, I don't know which one we could use.

What to return if successfully deleted record(s) ?

bajtos commented 6 years ago

I would personally not use LoopBack to replace my database administration tools, meaning that if I would like to empty a database table I would use any other means, not a REST API endpoint available to application programmers, but that's my personal opinion LB is an Integration and API endpoint framework not to replace database administration tools, so deleting all records in a table is something not usual for a client app.

+1

In fact, LoopBack 3.x does not expose deleteAll via REST API. Users have to expose it themselves.

marioestradarosa commented 6 years ago

@bajtos , you mean the generated controller does not include this end point and it is up to the app developer to add it ? . If that's what you meant, then I believe that we would still need to provide a safe implementation with this method on the LB side, unless you remove it completely and allow app developers to implement it in the repository at their own risk.

bajtos commented 6 years ago

you mean the generated controller does not include this end point and it is up to the app developer to add it ?

In LoopBack 3.x, deleteAll is not exposed in the REST API, only deleteById.

In LoopBack 4.x, the REST controller template does include deleteAll - I think this is an oversight we need to fix, I opened a pull request for that - see https://github.com/strongloop/loopback-next/pull/1582.

If that's what you meant, then I believe that we would still need to provide a safe implementation with this method on the LB side, unless you remove it completely and allow app developers to implement it in the repository at their own risk.

We need deleteAll method for example to allow tests to clean up the database between individual test cases.

I agree we should make the implementation more robust to leave less room for developer errors.

IMO, we should fix our connectors to correctly handle the where condition. See https://github.com/strongloop/loopback/issues/3094#issuecomment-403822442

A where condition {where: {id: 1}} should NOT MATCH any model instances that don't have where property. That sounds like a solid solution to me, one that should be applied to find(), updateAll() and all other APIs performing model filtering. It may be a huge breaking change though, therefore we should probably hide it behind a feature flag.

Based on the discussion above, it looks like some of our connectors like memory already implement this behavior, in which case we can introduce the feature flag at individual connector level only, make the new (safer) behavior enabled by default and publish the change as a semver-major release of each affected connector.

@b-admike I think we should start with adding more tests to juggler's shared test suite executed by all connectors, see test/manipulation.test.js - there is even a TODO saying to add tests for deleting records matching a query. With these tests in place, we can leverage our CI infrastructure to learn which connectors are already passing these new tests (and handle invalid deleteAll/destroyAll filters correctly) and which need to be fixed.

Thoughts?

b-admike commented 6 years ago

Thanks for pitching in @marioestradarosa. I've had a talk with @bajtos which is relevant to what you are proposing:

I'm working on tests/fix for https://github.com/strongloop/loopback/issues/3094 and was wondering what users should pass for a delete all records operation. You are for using undefined as opposed to Raymond who thinks undefined should be rejected. I think since calling the function without any parameters (destroyAll()) means delete all and the where object is undefined, but from the REST API perspective (calling it from explorer) I think {} makes more sense. Thoughts?

From @bajtos:

First of all, we need to avoid breaking changes - we cannot afford a semver-major release of juggler now. If destroyAll() used to work before, then IMO it should keep working after your fix too.

Please note that destroyAll is not exposed via REST API by default.

As for {} vs. undefined, I would expect that swagger-ui (our API Explorer) allows you to omit optional arguments in which case where parameter is not provided by the HTTP client and as a result, destroyAll is called with where=undefined. Please write a small app to verify the exact behavior so that we don't speculate here.

I agree on not making semver-major changes and so I think we can't do the following:

if (!not where) { throw an error } //we secure empty where clauses

In terms of making the fixes at the connector level, that makes sense since the behaviour is different in memory vs MySQL.

@b-admike I think we should start with adding more tests to juggler's shared test suite executed by all connectors, see test/manipulation.test.js - there is even a TODO saying to add tests for deleting records matching a query. With these tests in place, we can leverage our CI infrastructure to learn which connectors are already passing these new tests (and handle invalid deleteAll/destroyAll filters correctly) and which need to be fixed.

Thoughts?

This approach is fine by me, but I had started making changes in test/basic-querying.test.js (see https://github.com/strongloop/loopback-datasource-juggler/pull/1597/commits/ba16c48fa5d5f42dff68d94a38c04f50ff1bf560), so if that is not the appropriate place then I will put the tests in manipulation test file.

bajtos commented 6 years ago

This approach is fine by me, but I had started making changes in test/basic-querying.test.js (see strongloop/loopback-datasource-juggler@ba16c48), so if that is not the appropriate place then I will put the tests in manipulation test file.

Oh wow, I did not know we have destroyAll-related tests in multiple places. Feel free to pick the one that makes more sense to you.

JustCodin commented 6 years ago

This is scary in production. What's the best way to do safe bulk deletes until this is fixed? I was going to pass the 'where' query to 'find' and map over the returned array with a result of an array of ids, and then use it to construct a new query to 'destroyAll' as {id: inq: [...ids]} (all the tables in my schema have ids so this works), but then I found this issue #3968 with the 'find' method, which in a worse case scenario, would still delete all entries from the table. My current thought is to parse the attributes in the 'where' clause and then compare them against the attributes on the calling model to ensure they exist before doing the previously explained procedure. Are there any other 'gotchas' or does this sound ok? Appreciate any feedback in how to keep production data safe.

nitro404 commented 6 years ago

@JustCodin The deleteAll function should be perfectly safe to use, if you are still feeling uncertain then try validating your filter before passing it to the function.

nitro404 commented 6 years ago

Also I agree with adding some extra checks around the parameters being passed to deleteAll. It is certainly a tricky situation since where is not a reserved attribute keyword, unfortunately. I have had this affect me indirectly as well after having used the https://www.npmjs.com/package/loopback-cascade-delete-mixin loopback cascade delete mixin where it had a bug that did not invoke the deleteAll function correctly and we lost our whole staging server database.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been closed due to continued inactivity. Thank you for your understanding. If you believe this to be in error, please contact one of the code owners, listed in the CODEOWNERS file at the top-level of this repository.

fguille94 commented 3 years ago

+1