Open abeglova opened 8 months ago
I was looking at the data and here are the details.
Basket/Basket Item creation:
A Basket and associated Basket Item objects are created as soon as a user hits the checkout page. The ideal way to reach that checkout page is by clicking the Enroll Now
button on any course page. There is only one Basket per user and we use that every time a user hits the checkout process.
In the current implementation, There is only one basket and one basket item for each order but the system supports multiple basket items per basket.
Data Stats:
Environment | Baskets | Basket Items |
---|---|---|
Production | 74,354 | 56,655 |
RC | 158 | 68 |
Why are there empty baskets? (Without Basket Items):
/checkout
page URL or when the product id.checkout/?product=<product_id>
but the product id is invalid.Proposed Solutions:
1. A celery Task
A Celery task that runs after a certain period of time and deletes the basket and basket item objects within a time range based on the threshold.
2. Management command
A management command that would take start/end time
and delete all the basket and basket items objects that were created in that duration.
3. A one-time data migration
We write a one-time data migration to clean the existing basket and basket items in a specific range. This is not an ideal solution because it will fix the data but not the root cause. And it will be a one-time operation so we'll end up having baskets and basket items in the future.
Questions on cleanup criteria
FYI, @Ferdi @rachellougee @abeglova
Thanks for the detail, @arslanashraf7.
When an order is executed successfully, We delete associated basket items but the basket stays there.
Just curious why we don't delete basket if order is executed successfully. I know you mentioned that there are 2 other ways to create empty basket, but from this use case, it makes sense to delete both unless I am missing something.
I think celery task makes sense to me to eliminate any manual work.
In terms of hubspot integration, I don't think there will be any repercussions from deleting old baskets/items. Hubspot syncs the Order
model, and does not interact with the Basket
or BasketItem
models directly (though the Order
is created from the Basket
in ecommerce.api.create_unfulfilled_order
).
Update
Implementation details:
Based on the opinions and discussions done in a Slack Thread, we can go ahead with below-proposed implementation:
MID NIGHT - UTC
) to check for expired Baskets.timedelta
to mark the life span of a basket. This is a threshold value for a basket to live. Let's do a 15 days
delta as a default value but we will be able to change it anytime since it will be configurable.NOTE: Check for any Basket, Basket Item associations with other models while implementing this.
Out of scope:
*EDIT: Updated ticket description with acceptance criteria accordingly.
FYI. @Ferdi @rhysyngsun @jkachel (Please share your thoughts if you have any questions on proposed implementation)
One thing we need to watch out for is deleting a basket that, while stale, the application could still could attempt to use.
As part of this, we need to audit our current code that modifies Basket
s and BasketItem
s and most likely update it to be in line with this:
Basket
's items are modified, take a select_for_update()
lock on the basket.Basket
's items, update Basket.updated_at
to the current time.This will ensure that the deleting task can't delete a Basket out from under the application. This does create a new problem, though, because at any given point some number of Basket
records are going to be locked and if those are included in the set of records that need to be deleted it could make the delete query really slow or just fail altogether.
To address that, we'd want to just skip the locked rows - if they're locked they're being written to and so they're no longer stale so we don't want them anyway. To do this in django you can do a delete with a subquery and utlize select_for_update()
, passing the skip_locked
arg so the query doesn't wait for row locks to be released:
Basket.objects.filter(
id__in=(
Basket.objects.select_for_update(skip_locked=True)
.filter(updated_at__lte=cutoff_date)
)
).delete()
This will briefly lock those records, but that's probably an ok tradeoff because this should hopefully be pretty quick. That said, definitely add some descriptive logging around the task describing when it's started, how long it took, and how many effected rows.
@jkachel can we get your input on this?
How/can we standardize this behavior across applications?
Does this have any implications for a unified ecommerce application?
I think this plan is fine. I did some looking at the BasketSerializer
and it does do a select_for_update
when it's updating the basket itself. (See https://github.com/mitodl/mitxpro/blob/1fe6093d446b5d870ba0776862a418a9d5cb89f0/ecommerce/serializers.py#L470 which gets called in the update
method a couple times.) Given this, it should be pretty easy to have it bump the updated_at
on the Basket record so it'd be easier to see if a basket is in actual use or not. (I tested this on RC - my account there has a basket from 2022 in it and adding a new item to the cart uses the same basket, just with my new item in it.)
Clearing the basket at checkout is how MITx Online works, so making this change would bring xPRO in line with that.
There's no implication for a unified ecommerce application - the app itself wouldn't maintain a basket at all, it'd just send a message to it to add whatever product to it. The ecommerce app would treat baskets as ephemeral. Once the checkout is complete, we'd clear the basket object as well. (In other words, it'd work like MITx Online does now.) We may add a setting for persistence time for baskets to support the use case of learners adding multiple things to the cart.
@Anas12091101 I think most of the details are mentioned in the comments above, Could you take a look at it and take it to the PR phase?
CourseRunSelection
and CouponSelection
also have a foreign key relationship with Basket
.
We also need to delete entries of these two tables while deleting Basket
and BasketItems
cc: @arslanashraf7, @pdpinch
As part of the ol-data-platform work we noticed that xpro has 74,354 record in ecommerce_basket and 56,655 records in ecommerce_basketitem. Should baskets and basket items expire and get deleted from the database after some time?
Acceptance criteria
MID NIGHT - UTC
) to check for expired Baskets.timedelta
to mark the life span of a basket. This is a threshold value for a basket to live. Let's do a15 days
delta as a default value but we will be able to change it anytime since it will be configurable.Out of scope: