reapit / foundations

Foundations platform mono repo
57 stars 21 forks source link

403 on propertyimages.created events - Unusual retry according to documentation #10954

Open Max-Shannon opened 5 months ago

Max-Shannon commented 5 months ago

Describe the bug We are seeing some (not all) propertyimages.created events fail due to the resource not being available. Sometimes on retry the propertyimage is available.

My colleague reported a similar issue recently. https://github.com/reapit/foundations/issues/10700

To Reproduce

Issue 1 - [FAR240009]) Take event: 28af4ffc-3578-4af8-a0b4-b0b4f4e2c1ac

Issue 2 - [KLD240008])

Failed propertyimages.created events:

When this happens - if the user decides to re-order images, it causes quite a large amount of errors.

Expected behaviour Should the resource not be available at this time?

Should the retry not be exactly 60 seconds after the first failed attempt?

github-actions[bot] commented 5 months ago

Thank you for taking the time to report a bug. We prioritise bugs depending on the severity and implications, so please ensure that you have provided as much information as possible. If you haven’t already, it really helps us to investigate the bug you have reported if you provide ‘Steps to Replicate’ and any associated screenshots. Please ensure any personal information from the production database is obscured when submitting screenshots. This issue will be reviewed in our weekly refinement sessions and assigned to a specific project board. We may also update the ticket to request additional information, if required. For more information on our processes, please click here

AshDeeming commented 5 months ago

Hi @Max-Shannon Please could you confirm if this relates to the same client/customerID as the previously mentioned ticket? (SFG)

Max-Shannon commented 5 months ago

Hi @Max-Shannon Please could you confirm if this relates to the same client/customerID as the previously mentioned ticket? (SFG)

Yes it does.

Max-Shannon commented 5 months ago

Just seen another similar occurence - Consider this order of events.

[BLA240051]

propertyimages.created 92b5c445-8e6a-432f-8e2b-fc3c59650985 - failed at 12:45:16 f8350744-e150-450c-b1e4-3fc6558bc4fe - failed at 12:45:33

propertyimages.modified 5e9ea180-dcbb-40db-aa6b-304a225fb043 - failed at 12:46:07 (this was trying to reorder an image that wasn't available) c8983d14-5de4-4457-acf2-08b7cbda5dcd - failed at 12:46:09 (this was trying to reorder an image that wasn't available)

propertyimages.created 92b5c445-8e6a-432f-8e2b-fc3c59650985 - succeeded at 12:47:18 f8350744-e150-450c-b1e4-3fc6558bc4fe - succeeded at 12:47:35

plittlewood-rpt commented 5 months ago

Hi @Max-Shannon it looks like this relates to the same discussions we've had about image webhooks and it looks like the agent is adding new images and in some cases choosing to delete them again, however by the time they delete the image, the created event is already in the pipeline to be sent out. In addition, if an agent uploads a large number of images at the same time, they get uploaded sequentially in the background. The database rows (which generated the events) get committed much faster. We have discussed holding image events up a bit so we can execute our own existence checks however this would only deal with some instances of the problem. To that end, I've logged another ticket with our CRM engineering team to look at only committing new images when the user closes the pictures window, and committing them in sequence with the file uploads. The user will havecompleted any deletions of images by that point which means the created events for images that got deleted will never go into the webhook pipeline in the first place.

In answer to your query regarding the retry, the timings won't always be exact and I will update the documentation to reflect this. When an event fails to send, it goes back into the queue with a timeout the prevents the processing services 'seeing' that event again for the appropriate number of seconds. If there's a lot of failures in the queue at a given time, when the event becomes visible again it may not necessarily get picked up immediately so it's expected to see some discrepancy in the timings.

One thing you may want to consider introducing in your own system is to not try and process image events immediately upon receipt. This would give any modifications/deletions chance to filter through before you start any processing. You could queue the events up, then group them and process all images for a given property in one go, which would likely also reduce any API calls you might be making off the back of receiving those events.

plittlewood-rpt commented 5 months ago

HI @Max-Shannon I've now updated the documentation so it's a little clearer about the timings. Can you confirm if you've had any further problems since my original response, bearing in mind the content of the above? Thanks

Max-Shannon commented 5 months ago

HI @Max-Shannon I've now updated the documentation so it's a little clearer about the timings. Can you confirm if you've had any further problems since my original response, bearing in mind the content of the above? Thanks

Hi - I'm still seeing some where the image retry allows the download 2 minutes or so later. Majority of cases, are like you describe where the images are being not available after all retries. But there are such cases.

Could you leave this with me for a couple of days? I will gather some examples for you.

plittlewood-rpt commented 5 months ago

Hi Max - yep no problem. Thanks!

github-actions[bot] commented 5 months ago

This issue has been updated and moved to our ‘Near Term’ column (typically completed within 0 - 4 months). We have assessed the effort required and outlined a technical specification - please take the time to review this detail. When we're ready to schedule the issue, it will be assigned to the relevant board where you can continue to track its progress to completion. For more information on our processes, please click here