Closed ChrisChV closed 2 months ago
Thanks for the pull request, @ChrisChV!
Please work through the following steps to get your changes ready for engineering review:
If you haven't already, check this list to see if your contribution needs to go through the product review process.
To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
This PR must be merged before / after / at the same time as ...
This PR is waiting for OEP-1234 to be accepted.
This PR must be merged by XX date because ...
This is for a course on edx.org.
If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.
This repository is currently maintained by @openedx/axim-engineering
. Tag them in a comment and let them know that your changes are ready for review.
If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:
Our goal is to get community contributions seen and reviewed as efficiently as possible.
However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
:bulb: As a result it may take up to several weeks or months to complete a review and merge your PR.
Attention: Patch coverage is 94.73684%
with 4 lines
in your changes missing coverage. Please review.
Project coverage is 93.98%. Comparing base (
80ddb73
) to head (780e4b2
). Report is 3 commits behind head on master.
Files with missing lines | Patch % | Lines |
---|---|---|
opaque_keys/edx/keys.py | 75.00% | 1 Missing and 1 partial :warning: |
opaque_keys/edx/locator.py | 91.66% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@ormsbee I remember you had some concerns about introducing a new opaque key. Can you let us know your thoughts?
The tagging API currently assumes that anything we apply tags to has an opaque key, so in order to allow tagging of Collections, we need to either implement this key or remove that assumption.
@ormsbee I remember you had some concerns about introducing a new opaque key. Can you let us know your thoughts?
@bradenmacdonald: This is the Slack thread where I asked the question about whether we really needed them (you, @kdmccormick, and @yusuf-musleh commented).
It boils down to the fact that OpaqueKey machinery is weird and complex for what it does (even explaining to someone why there's a CollectionKey
and a CollectionLocator
is weird), so do we really need it for this feature?
Also, it's plausible that we'd want to change a library's key at some point, which is more feasible to do now since we're not storing as many references to it. Keys in openedx_learning
app models are almost all relative to the LearningPackage
you're in. The only top level one that's stored is the library key on the LearningPackage
itself (and I'm not sure even that's desirable, since we store it again in edx-platform side).
The tagging API currently assumes that anything we apply tags to has an opaque key, so in order to allow tagging of Collections, we need to either implement this key or remove that assumption.
I was under the impression that any sort of identifier would do, and you mention in the thread that UUIDs would work as well. Is it important that they be humanly readable? If it's just backend machinery at the moment, I'd prefer to use UUIDs for now. Or even the primary key of the thing being tagged if it's easy–since that would be the most compact representation. I know some other tagging libraries actually make a model-per-thing-being-tagged.
In the longer term, I think that tagging functionality will be simpler and easier to adopt into other systems if it does not force a requirement for OpaqueKeys (e.g. forums posts/comments, catalog information, students, etc.). I would expect that openedx_tagging
eventually moves into its own, top level repo.
All that being said, I don't know how much work any of this is. If making it more general threatens the Sumac timeline, I'm okay with rolling forward with a CollectionKey
for now.
FYI @kdmccormick
@ormsbee We can of course use UUIDs for the things that are being tagged. I was thinking that if we use primary keys, or perhaps even if we want to use UUIDs, we'd want some kind of namespacing so we have an idea of what sort of thing is being tagged and/or can avoid PK collisions. And if we did that ("lib-collection:23489"
as an example), then whatever idiosyncratic namespacing we implement is not that different from opaque keys. So it would be weird to implement a second type of string key namespacing, when we already have a library for it (although the key vs. locator thing is weird, and it would be nice to get rid of it).
However, I guess if we're strictly using UUIDs, then we don't technically need namespacing at all. It may be hard to debug what thing is actually tagged though, since we'll just see a random UUID and have no idea what type of thing it is. Thoughts?
Oh right, I'm forgetting that there's a single global tagging table and that we need to namespace. Would it be terrible if we stored UUIDs and had a 1:1 join table that matched those rows with PublishableEntity
? Or two tables, one to match against Published
and the other Draft
? (Sorry, I don't remember how we're distinguishing between those two in the tagging.)
Here is the current ID field; it's just a string ID. Tagging is not part of the publishing cycle so doesn't distinguish between published and draft things.
We want to associate that ObjectTag
model with some Collection
instances, which are themselves not PublishableEntities. So I think we just need some way of putting the Collection
's primary key into the ObjectTag.object_id
string field. We could also add an additional generic foreign key to ObjectTag
but that's getting complicated. Since Collection
is also in the same overall package as tagging, we could also create a join table specifically for Collection-ObjectTag, but I'm not sure what benefit that would offer.
@bradenmacdonald: Please pardon the silly questions stemming from my ignorance of the data model, and my brain being stuck on components and not collections.
If the same tag from the same taxonomy applies to three totally different types of things (e.g. a component, a collection, and a forum comment), does the data model have any way to differentiate them at the database layer? Or does it rely purely on the object_id
(and consequently OpaqueKeys) to do the dispatch between them?
Edit: My sense is "no, there isn't", but while I don't think it's good to roll our own opaque-keys-lite and embed that knowledge into the object_id
itself, I do think it makes sense for the data model to know there are different types of things being tagged at the database layer and to namespace them with a different column that's a foreign keys to the types of things that exist. Otherwise, it seems like basic filtering by type would be way harder than it should be?
@ormsbee
Or does it rely purely on the object_id (and consequently OpaqueKeys) to do the dispatch between them?
That.
it seems like basic filtering by type would be way harder than it should be?
We haven't had any use case for basic filtering by type, although it could currently be done as a string prefix search on object_id
(an indexed column), which means it's already extremely efficient even without a dedicated column.
We could add another column that's a foreign key to the contenttype
of the thing that's being tagged, certainly - but we'd have to carefully review the code in various places which assumes object_id
is the entire reference to the thing being tagged.
Like I said before, I don't want to jeopardize the timeline over this, and I'm not going to block the merging of new key types. As a general rule, I would like more of our data model to be explicitly structured at the database layer, and not rely on Python code to introspect things (or having to know that key type X always starts with one of these two strings). I realize that to a certain extent we have to do that, because so much of our content is still in the ModuleStore and serialized to MongoDB, but I don't want it to be our long term direction.
We haven't had any use case for basic filtering by type, although it could currently be done as a string prefix search on object_id (an indexed column), which means it's already extremely efficient even without a dedicated column.
Also, having some kind of hierarchy (which the OpaqueKey has) could be handy. If we nuke a Library with all the Components and Collections, we can use the object_id
to clear all tags without iterating the library's children. It also makes it easier to find orphaned items.
@kdmccormick Thanks for the review! I updated the PR with your suggestions
I think we just need to address @kdmccormick 's question about combining LibraryCollectionLocator.lib_key
and LibraryCollectionLocator.library_key
, and then this is good to go from my perspective.
Please also include a version bump so it'll create a new package version.
Please also include a version bump so it'll create a new package version.
@bradenmacdonald Done!
@bradenmacdonald It's ready for merge
@ChrisChV 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.
@bradenmacdonald I see the tag for this version, but the pip package is not updated
@ChrisChV It seems like I needed to publish a "Release". I've done that now so you should see it on PyPI soon.
Hi @kdmccormick Sorry for pinging you directly, but we're having trouble bumping this requirement in edx-platform, cf https://github.com/openedx/edx-platform/pull/35383. It doesn't seem to be actually constrained in anyone else's .in
files that we can find, but it is in their .txt
files.
What's the procedure for updating these fundamental packages?
Description
Creates
LibCollectionKey
andLibCollectionLocator
Supporting information
Testing instructions