feat: LibraryCollectionKey and LibraryCollectionLocator created [FC-0062]

ChrisChV commented 2 months ago

Description

Creates LibCollectionKey and LibCollectionLocator

Supporting information

Github Issue: https://github.com/openedx/modular-learning/issues/228
This change is used on: https://github.com/openedx/edx-platform/pull/35383
Internal ticket: https://tasks.opencraft.com/browse/FAL-3785

Testing instructions

Check all the code
Verify that the test covers the changes

openedx-webhooks commented 2 months ago

Thanks for the pull request, @ChrisChV!

What's next?

Please work through the following steps to get your changes ready for engineering review:

:radio_button: Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
- This process (including the steps you'll need to take) is documented here.
If it doesn't, simply proceed with the next step.

:radio_button: Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

Dependencies

This PR must be merged before / after / at the same time as ...
Blockers

This PR is waiting for OEP-1234 to be accepted.
Timeline information

This PR must be merged by XX date because ...
Partner information

This is for a course on edx.org.
Supporting documentation
Relevant Open edX discussion forum threads

:radio_button: Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

:radio_button: Let us know that your PR is ready for review:

Who will review my changes?

This repository is currently maintained by @openedx/axim-engineering. Tag them in a comment and let them know that your changes are ready for review.

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

The size and impact of the changes that it introduces
The need for product review
Maintenance status of the parent repository

:bulb: As a result it may take up to several weeks or months to complete a review and merge your PR.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 94.73684% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.98%. Comparing base (80ddb73) to head (780e4b2). Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
opaque_keys/edx/keys.py	75.00%	1 Missing and 1 partial :warning:
opaque_keys/edx/locator.py	91.66%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #335 +/- ## ========================================== + Coverage 93.96% 93.98% +0.01% ========================================== Files 29 30 +1 Lines 2850 2924 +74 Branches 640 649 +9 ========================================== + Hits 2678 2748 +70 - Misses 145 148 +3 - Partials 27 28 +1 ``` | [Flag](https://app.codecov.io/gh/openedx/opaque-keys/pull/335/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openedx) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/openedx/opaque-keys/pull/335/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openedx) | `93.98% <94.73%> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openedx#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bradenmacdonald commented 2 months ago

@ormsbee I remember you had some concerns about introducing a new opaque key. Can you let us know your thoughts?

The tagging API currently assumes that anything we apply tags to has an opaque key, so in order to allow tagging of Collections, we need to either implement this key or remove that assumption.

ormsbee commented 2 months ago

@ormsbee I remember you had some concerns about introducing a new opaque key. Can you let us know your thoughts?

@bradenmacdonald: This is the Slack thread where I asked the question about whether we really needed them (you, @kdmccormick, and @yusuf-musleh commented).

It boils down to the fact that OpaqueKey machinery is weird and complex for what it does (even explaining to someone why there's a CollectionKey and a CollectionLocator is weird), so do we really need it for this feature?

Also, it's plausible that we'd want to change a library's key at some point, which is more feasible to do now since we're not storing as many references to it. Keys in openedx_learning app models are almost all relative to the LearningPackage you're in. The only top level one that's stored is the library key on the LearningPackage itself (and I'm not sure even that's desirable, since we store it again in edx-platform side).

The tagging API currently assumes that anything we apply tags to has an opaque key, so in order to allow tagging of Collections, we need to either implement this key or remove that assumption.

I was under the impression that any sort of identifier would do, and you mention in the thread that UUIDs would work as well. Is it important that they be humanly readable? If it's just backend machinery at the moment, I'd prefer to use UUIDs for now. Or even the primary key of the thing being tagged if it's easy–since that would be the most compact representation. I know some other tagging libraries actually make a model-per-thing-being-tagged.

In the longer term, I think that tagging functionality will be simpler and easier to adopt into other systems if it does not force a requirement for OpaqueKeys (e.g. forums posts/comments, catalog information, students, etc.). I would expect that openedx_tagging eventually moves into its own, top level repo.

All that being said, I don't know how much work any of this is. If making it more general threatens the Sumac timeline, I'm okay with rolling forward with a CollectionKey for now.

FYI @kdmccormick

bradenmacdonald commented 2 months ago

@ormsbee We can of course use UUIDs for the things that are being tagged. I was thinking that if we use primary keys, or perhaps even if we want to use UUIDs, we'd want some kind of namespacing so we have an idea of what sort of thing is being tagged and/or can avoid PK collisions. And if we did that ("lib-collection:23489" as an example), then whatever idiosyncratic namespacing we implement is not that different from opaque keys. So it would be weird to implement a second type of string key namespacing, when we already have a library for it (although the key vs. locator thing is weird, and it would be nice to get rid of it).

However, I guess if we're strictly using UUIDs, then we don't technically need namespacing at all. It may be hard to debug what thing is actually tagged though, since we'll just see a random UUID and have no idea what type of thing it is. Thoughts?

ormsbee commented 2 months ago

Oh right, I'm forgetting that there's a single global tagging table and that we need to namespace. Would it be terrible if we stored UUIDs and had a 1:1 join table that matched those rows with PublishableEntity? Or two tables, one to match against Published and the other Draft? (Sorry, I don't remember how we're distinguishing between those two in the tagging.)

bradenmacdonald commented 2 months ago

Here is the current ID field; it's just a string ID. Tagging is not part of the publishing cycle so doesn't distinguish between published and draft things.

We want to associate that ObjectTag model with some Collection instances, which are themselves not PublishableEntities. So I think we just need some way of putting the Collection's primary key into the ObjectTag.object_id string field. We could also add an additional generic foreign key to ObjectTag but that's getting complicated. Since Collection is also in the same overall package as tagging, we could also create a join table specifically for Collection-ObjectTag, but I'm not sure what benefit that would offer.

ormsbee commented 2 months ago

@bradenmacdonald: Please pardon the silly questions stemming from my ignorance of the data model, and my brain being stuck on components and not collections.

If the same tag from the same taxonomy applies to three totally different types of things (e.g. a component, a collection, and a forum comment), does the data model have any way to differentiate them at the database layer? Or does it rely purely on the object_id (and consequently OpaqueKeys) to do the dispatch between them?

Edit: My sense is "no, there isn't", but while I don't think it's good to roll our own opaque-keys-lite and embed that knowledge into the object_id itself, I do think it makes sense for the data model to know there are different types of things being tagged at the database layer and to namespace them with a different column that's a foreign keys to the types of things that exist. Otherwise, it seems like basic filtering by type would be way harder than it should be?

bradenmacdonald commented 2 months ago

@ormsbee

Or does it rely purely on the object_id (and consequently OpaqueKeys) to do the dispatch between them?

That.

it seems like basic filtering by type would be way harder than it should be?

We haven't had any use case for basic filtering by type, although it could currently be done as a string prefix search on object_id (an indexed column), which means it's already extremely efficient even without a dedicated column.

We could add another column that's a foreign key to the contenttype of the thing that's being tagged, certainly - but we'd have to carefully review the code in various places which assumes object_id is the entire reference to the thing being tagged.

ormsbee commented 2 months ago

Like I said before, I don't want to jeopardize the timeline over this, and I'm not going to block the merging of new key types. As a general rule, I would like more of our data model to be explicitly structured at the database layer, and not rely on Python code to introspect things (or having to know that key type X always starts with one of these two strings). I realize that to a certain extent we have to do that, because so much of our content is still in the ModuleStore and serialized to MongoDB, but I don't want it to be our long term direction.

rpenido commented 2 months ago

We haven't had any use case for basic filtering by type, although it could currently be done as a string prefix search on object_id (an indexed column), which means it's already extremely efficient even without a dedicated column.

Also, having some kind of hierarchy (which the OpaqueKey has) could be handy. If we nuke a Library with all the Components and Collections, we can use the object_id to clear all tags without iterating the library's children. It also makes it easier to find orphaned items.

ChrisChV commented 2 months ago

@kdmccormick Thanks for the review! I updated the PR with your suggestions

bradenmacdonald commented 2 months ago

I think we just need to address @kdmccormick 's question about combining LibraryCollectionLocator.lib_key and LibraryCollectionLocator.library_key, and then this is good to go from my perspective.

Please also include a version bump so it'll create a new package version.

ChrisChV commented 2 months ago

Please also include a version bump so it'll create a new package version.

@bradenmacdonald Done!

ChrisChV commented 2 months ago

@bradenmacdonald It's ready for merge

openedx-webhooks commented 2 months ago

@ChrisChV 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

ChrisChV commented 2 months ago

@bradenmacdonald I see the tag for this version, but the pip package is not updated

bradenmacdonald commented 2 months ago

@ChrisChV It seems like I needed to publish a "Release". I've done that now so you should see it on PyPI soon.

pomegranited commented 2 months ago

Hi @kdmccormick Sorry for pinging you directly, but we're having trouble bumping this requirement in edx-platform, cf https://github.com/openedx/edx-platform/pull/35383. It doesn't seem to be actually constrained in anyone else's .in files that we can find, but it is in their .txt files.

What's the procedure for updating these fundamental packages?

openedx / opaque-keys