simpleidserver / SimpleIdServer

OpenID, OAuth 2.0, SCIM2.0, UMA2.0, FAPI, CIBA & OPENBANKING Framework for ASP.NET Core
https://simpleidserver.com/
Apache License 2.0
719 stars 94 forks source link

[SCIM] Feature Request - make patch faster by returning 204 #570

Closed danflomin closed 4 months ago

danflomin commented 1 year ago

Hello

During my work on improving the performance of SCIM, especially the PATCH method for groups, I saw that for each PATCH request, we query the entire group representation from the DB. This takes a long time when the group has many members (can actually take most of the running time).

Since most of the PATCH requests for groups is about adding\removing members, it would be much more efficient to not load all the data into memory. I think we should check each PatchOperation separately, such that for:

Since it is common for vendors to have groups with many members, and to send PATCH requests in bursts, it will be much better for performance to do it this way.

The SCIM RFC allows returning 204 in case of successful PATCH. image

I will appreciate this very much. What do you think on this?

Kind regards Dan

simpleidserver commented 1 year ago

Hello,

I'm working on a solution to fix this problem.

KR,

SID

simpleidserver commented 1 year ago

Todo List :

gabrielemilan commented 1 year ago

Hello, I'm trying to do a real test with more than 2k users in one single group and I noticed that the response time increases every time you put new user.

image

Now I have only 150 users in one group, in my scenario I need to manager 10k users in one single group.

I'm using last version of 4.0.4 branch locally

simpleidserver commented 1 year ago

Hello,

The logic used to add a member to a group takes too much time because, in the previous version, all the 'members' are retrieved. I have made some modifications to address this issue in the 'release/4.0.4' branch, and the performance is now better!

Kind regards,

SID

gabrielemilan commented 1 year ago

I tried and it seems faster now, well done!

I tried also the get user by userName and I suggest adding this index on mongo:

db.scimRepresentationAttributes.createIndex( { RepresentationId: 1, SchemaAttributeId: 1, ValueString: 1 }, { name: "RepresentationId_1_SchemaAttributeId_1_ValueString_1" } );

This is the result:

Before image

After image

simpleidserver commented 1 year ago

Hello @danflomin & @gabrielemilan ,

All the changes have been committed to the release/4.0.4 branch, and the performance has improved. I made an effort to consistently load the minimal subset of attributes used during PATCH, POST, and DELETE operations.

However, there have been some changes in the database, including the addition of two new columns in the SCIMRepresentationAttribute table: ComputedValueIndex and IsComputed.

I have written a small documentation explaining how to migrate from version 4.0.3 to 4.0.4. You can find the documentation at the following link: https://github.com/simpleidserver/SimpleIdServer/blob/release/4.0.4/website/docs/migrations/403to404.md

KR,

SID

gabrielemilan commented 1 year ago

I noticed performance issue using this patch, trying to remove user from group:

url: v2/Groups/d8c7ac0d-68a4-4d4a-bf01-afcc9a907e11

{ "schemas": [ "urn:ietf:params:scim:api:messages:2.0:PatchOp" ], "Operations": [ { "op":"remove", "path": "members[value eq\"00179e8b-1c75-45e5-b0af-a02ebf4b55fd\"]" } ] }

when I have a lot of users in one group the process doesn't end.

simpleidserver commented 1 year ago

Indeed, there is an issue in the function 'SCIMRepresentationCommandRepository.FindAttributes.' It executes the following MongoDB instruction and causes a performance problem. I rewrote the algorithm, and the code is available in the 'release/4.0.4' branch. Can you please try it again?


[
  {
    $project: {
      _outer: "$$ROOT",
      _id: 0,
    },
  },
  {
    $lookup: {
      from: "representationAttributes",
      localField: "_outer._id",
      foreignField: "ParentAttributeId",
      as: "_inner",
    },
  },
  {
    $project: {
      a: "$_outer",
      Children: "$_inner",
      _id: 0,
    },
  },
  {
    $match: {
      "a.RepresentationId":
        "21c11c48-7418-4307-9da2-ff524947e3fe",
    },
  },
  {
    $project: {
      Attribute: "$a",
      Children: {
        $map: {
          input: "$Children",
          as: "c",
          in: {
            Attribute: "$$c",
          },
        },
      },
      _id: 0,
    },
  },

  {
    $match: {
      $and: [
        {
          "Attribute.SchemaAttributeId":
            "8e5b5165-b130-4a7e-8e9e-f359095cbc74",
        },
        {
          $expr: {
            $anyElementTrue: {
              $map: {
                input: "$Children",
                as: "v__1",
                in: {
                  $and: [
                    {
                      $eq: [
                        "$$v__1.Attribute.SchemaAttributeId",
                        "13f2bf74-79e1-4afe-9e2f-6996ef4251b9",
                      ],
                    },
                    {
                      $eq: [
                        {
                          $toLower: {
                            $ifNull: [
                              "$$v__1.Attribute.ValueString",
                              "",
                            ],
                          },
                        },
                        "a0be941f-4a23-493f-b200-06fdb9a98bcc",
                      ],
                    },
                  ],
                },
              },
            },
          },
        },
      ],
    },
  },
]
gabrielemilan commented 1 year ago

Yes, now it works!

Thanks.

yoavyanuka commented 9 months ago

It looks like it's somehow worse now. I am running a patch on a group with 17K users adding another 100 users and the operation takes 20 + minutes and counting. Using EF.

KR Yoav

yoavyanuka commented 9 months ago

Update: Looks like it works ok on a newly created group (after the fix) but an old group is getting stuck on patch

yoavyanuka commented 9 months ago

Update 2: Looks like it's doing countless calculations around the SyncIndirectReferences specifically this for loop:

foreach (var pa in parentAttrs) {
    if (!children.Any(r => r.ParentAttributeId == pa.Id && r.SchemaAttributeId == valueAttr.Id && r.ValueString == newSourceScimRepresentation.Id))
        result.AddReferenceAttributes(BuildScimRepresentationAttribute(pa.RepresentationId, propagatedAttribute.TargetAttributeId, newSourceScimRepresentation, Mode.PROPAGATE_INHERITANCE, newSourceScimRepresentation.ResourceType, targetSchema, false));
}
                                }
simpleidserver commented 9 months ago

Hello,

There is a breaking change in the SCIM library between versions 4.0.3 and 4.0.4.

Specifically, two columns have been added to the SCIMRepresentationAttribute :

  1. ComputedIndexValue: This column now stores a computed value for each attribute. In the case of complex attributes, the value is the concatenation of its children's values. This value plays a crucial role when the SCIM server is checking the uniqueness of the attribute.

  2. IsComputed: This column indicates whether the attribute is automatically computed or not. For instance, the attribute groups.type is always computed by the server and only has two possible values, direct or indirect.

Upon adding a new group, both columns are correctly populated. However, for older groups, a migration process must be executed. For detailed information about the migration process, please refer to the documentation available at: https://simpleidserver.com/docs/migrations/403to404/

KR,

SID

yoavyanuka commented 9 months ago

I followed the instructions. The migration looks to add the new columns with default value null and false not populate it. How do we need to populate current rows?

KR, Yoav

danflomin commented 9 months ago

Thank you for your response on this ticket @thabart .

You are the best 😄

I somehow missed the commented code in the Startup.cs file.

I'll try to run it and report on how the migration went.

Kind regards Dan

danflomin commented 9 months ago

Hi @thabart

I have 2 questions:

Should this part be without a filter? I thought that the ComputedValueIndex should exist only when IsComputed is true.

var groupedAttributes = context.SCIMRepresentationAttributeLst.Include(a => a.SchemaAttribute).GroupBy(a => a.RepresentationId);

And also, do you think it'll be possible to convert this migration into a SQL script? I believe it'll be much faster.

Kind regards Dan

simpleidserver commented 9 months ago

When the property IsComputed is set to true, the attribute is automatically calculated by SCIM. Examples include the properties 'type' and 'display'.

The property ComputedValueIndex is a concatenation of the children's values. SCIM utilizes it to enhance the execution of the comparison algorithm between two representations.

Both properties serve different needs; therefore, the following part must be without a filter:

var groupedAttributes = context.SCIMRepresentationAttributeLst.Include(a => a.SchemaAttribute).GroupBy(a => a.RepresentationId);

While it is possible to migrate the logic into a SQL script, there is a risk of regression because all the .NET logic must be translated :)