reapit / foundations

Foundations platform mono repo
57 stars 21 forks source link

Metadata limited to 500 items #10990

Closed Ian-T-Price closed 3 months ago

Ian-T-Price commented 6 months ago

Describe the bug It is only possible to return 500 items from a Metadata Schema.

If a 501st item is added to the metadataSchema a seemingly random(?) item is displaced by the new item.

If an unrelated item is deleted then the displaced item reappears.

Thus, all items are being stored but only the last 500 items are available to be used.

To Reproduce Steps to reproduce the behaviour:

  1. Create a Metadata schema or use BeresfordsReferrals20240329 in the SandBox account
  2. Create 500 items in the metadataSchema - The above schema has 500 items
  3. Add a new item
  4. The item will be added but another item will be displaced. The displaced item cannot be listed or searched for even in a subset of data.
  5. Delete an unrelated item
  6. The displaced item will reappear

Expected behaviour All items should be available. We are looking to store upwards of 20,000 items.

Screenshots N/A

Device (please complete the following information for web issues):

Additional context Add any other context about the problem here.

Specification Service: Reapit.Services.Metadata, Reapit.Lambdas.MetadataProxy

github-actions[bot] commented 6 months ago

Thank you for taking the time to report a bug. We prioritise bugs depending on the severity and implications, so please ensure that you have provided as much information as possible. If you haven’t already, it really helps us to investigate the bug you have reported if you provide ‘Steps to Replicate’ and any associated screenshots. Please ensure any personal information from the production database is obscured when submitting screenshots. This issue will be reviewed in our weekly refinement sessions and assigned to a specific project board. We may also update the ticket to request additional information, if required. For more information on our processes, please click here

RWilcox-Reapit commented 5 months ago

Hi @Ian-T-Price

Could I ask what the use case for adding 20,000 schemas is? Do you have 20,000 entityTypes? Currently, you only need one schema per entityType as they can contain X amount of validation checks.

The endpoint does not currently support paging as we never foresaw a use case for someone having over 500 schemas. We can go ahead and add support for that if required but just trying to see if we can resolve this another way.

Thanks, Ryan

github-actions[bot] commented 5 months ago

This issue has been updated and moved to our ‘Near Term’ column (typically completed within 0 - 4 months). We have assessed the effort required and outlined a technical specification - please take the time to review this detail. When we're ready to schedule the issue, it will be assigned to the relevant board where you can continue to track its progress to completion. For more information on our processes, please click here

Ian-T-Price commented 5 months ago

It is not 20,000 entityTypes - heavens, and probably ReapIT, forbid. There is only one production schema.

It is 20,000 items/records in an entityType. We currently have 10,000 items but I'm working hard to cut down both the items and the fields within the item; this will possibly get as low as 5,000 items. We'll be adding approx. 1,000 - 1,500 items per year after that.

RWilcox-Reapit commented 5 months ago

Hi @Ian-T-Price

Thanks for confirming that makes sense now. So you are hitting the metadata endpoint to retrieve entities for your custom type 'BeresfordsReferrals20240329'

I've updated the spec on the ticket as we'll need to add paging support to the endpoint.

Thanks

Ian-T-Price commented 5 months ago

Hello @RWilcox-Reapit

That is correct. So my next question is, when will this be scheduled? A rough timeline of days, weeks, months would be great. I am, inevitably of course, ready to go live with our app. so I need to manage expectations with Beresford.

Our metadata item is no more than 2k each so we are well within the 1Mb fetch limit for DynamoDB given that we can get a max. of 100 items on a page. I therefore assumed that the 500 item limit was an artificial limit for the SandBox account only. Is this not the case?

Many Thanks, Ian

plittlewood-rpt commented 5 months ago

HI @Ian-T-Price I've been trying to reach you via the email address you have registered but I'm getting a bounce back. Can I ask you to check your details please?

Our systems have flagged a huge number of errors from your app Beresfords Referrals System today, specifically trying to query metadata against the Sandbox. The volume is so large it looks like some kind of automated process.

The issue can be resolved by wrapping the string value you are filtering metadata for in single quotes as outlined in the documentation

The call you are making is currently: https://platform.reapit.cloud/metadata/?PageNumber=1&PageSize=100&entityType=BeresfordsReferrals20240329&filter=metadata.ReapitID+%24eq+WHY230094 This would become: https://platform.reapit.cloud/metadata/?PageNumber=1&PageSize=100&entityType=BeresfordsReferrals20240329&filter=metadata.ReapitID+%24eq+'WHY230094'

Please can you arrange to make the relevant fix to your system?

To answer your previous question, it may be a month or two before we can get to this unfortunately, however I'll flag with our product owner for their attention.

Ian-T-Price commented 5 months ago

Hi @plittlewood-rpt

Sincere apologises for that; I'll reply via your email which I have received late this afternoon. Full explanations in the email.

Many thanks for the timeline update. We can live with a month or two and I'm working on methods to allow for the limit as well.

Cheers Ian

RWilcox-Reapit commented 3 months ago

Hi @Ian-T-Price,

We have added functionality to now allow you to use token based paging which will also provide substantial performance improvements. To use the new functionality there are two query parameters to pass firstly: useTokenBasedPaging=true

You'll now receive back a nextPageToken alongside your paging details. Similar to this:

{
    "_embedded": [
     // data obfuscated
    ],
    "nextPageToken": "SBOX_applicant_Marketplace:RPT230004",
    "pageNumber": 1,
    "pageSize": 10,
    "pageCount": 4,
    "totalPageCount": 1,
    "totalCount": 4,
    "_links": {
        "self": {
            "href": "/metadata/?PageNumber=1&PageSize=10&entityType=applicant&useTokenBasedPaging=true"
        },
        "first": {
            "href": "/metadata/?PageNumber=1&PageSize=10&entityType=applicant&useTokenBasedPaging=true"
        }
    }
}

When this is NOT null it should be passed back as a query param like so: nextPageToken=SBOX_applicant_Marketplace:RPT230004 When this returns as null you have hit the last page and collected all relevant data.

Thanks, Ryan

github-actions[bot] commented 3 months ago

It looks like you have commented on a closed issue. If your comment relates to a bug or feature request, please open a new issue, and include this issue number/url for reference. For more information on our processes, please click here

Ian-T-Price commented 3 months ago

Many thanks for this. I can confirm that the returned results are now as expected ;-)

However, it is a somewhat non-standard approach compared to the rest of the API and it breaks the page and totals counts. It is not possible to get the total number of items without stepping though the entire dataset and counting them. If you don't use the TokenBasedPaging then the max totalCount is still reported as 500.

This is not an issue for my app however.

I presume there was a good reason for this solution; I see performance mentioned in #11244

github-actions[bot] commented 3 months ago

It looks like you have commented on a closed issue. If your comment relates to a bug or feature request, please open a new issue, and include this issue number/url for reference. For more information on our processes, please click here