openedx / modular-learning

3 stars 1 forks source link

[Tagging] Handle tags when importing/exporting courses #182

Closed yusuf-musleh closed 6 months ago

yusuf-musleh commented 8 months ago

"As a content author, when I export a course/library I want the tag data to be stored in the export. Subsequently when I import a course/library that has tag data in it, I expect that data to be imported as well. When Importing from an external instance, I want the imported tag data to match the Export ID present in my instance and import it accordingly, otherwise those unmatched taxonomy/tags should be dropped hidden."

Acceptance Criteria

  1. When a course with tags is exported to a .tar.gz file, it includes a new tags.csv file with the course tags.
    • This should be exactly the same format as already implemented in https://github.com/openedx/edx-platform/pull/34091
    • If neither the course itself nor any items in the course have tags, the tags.csv file is not included. (Probably use get_object_tag_counts() to determine this very efficiently.)
    • The format of the the tags.csv file will have to be changed slightly:
      • the "Type" column will have to be the block type (e.g. vertical not "Unit" etc.)
      • the "ID" column should just include the block_id e.g. 1238971 or unit7 not the whole usage key (block-v1:...). This is because the course export format is independent of the course ID.
      • These changes should be made to all versions of the "export tags CSV" code (don't just make these changes for course export).
  2. When a course is imported, if the import contains a tags.csv file, then after the content is imported, all the tags will be applied using the CSV. Invalid block IDs are ignored.
  3. Taxonomy Matching: Taxonomies should be matched based on their https://github.com/openedx/modular-learning/issues/183. On import, tags should always be imported as ObjectTags, even if no matching Taxonomy is found, or no matching tag currently exists in the taxonomy. (To prevent data loss.)
    • Scenario 1: Tags in the OLX have a matching Taxonomy (by Export ID) and matching Tags in that Taxonomy (by Value):
      • Tags will be imported and visible in Studio.
      • (Create the ObjectTag as normal, linked to the Taxonomy and Tag)
    • Scenario 2: Tags in the OLX have a matching Taxonomy (by Export ID) but some of the Tags in the OLX are not found in the Taxonomy:
      • Such tag(s) should not be visible in Studio, but iff the Taxonomy is ever updated to include that tag value, the tags will "re-appear".
      • (Create the ObjectTag anyways, linked to the Taxonomy but not linked to the Tag. The value is set in _value.)
    • Scenario 3: Tags in the OLX do not match any Taxonomy (by Export ID):
      • These tags will not be visible in the Studio UI, but should still be exported if the course is exported again. Iff a Taxonomy is ever created/imported with the matching Export ID, the tags will "re-appear".
      • (Still create the ObjectTag instances for these tags, with the Export ID as the _name and the value as the _value.)
    • Scenario 4: Tags in the OLX match a Taxonomy (by Export ID) but the current user does not have permission to view/use that taxonomy [because it's linked to a different org]:
      • Tags will be imported and visible in Studio. Course author will be able to delete them if they want. Course author will not be able to add additional tags from the same taxonomy though, because they don't have permission.

Library import/export will be handled in https://github.com/openedx/modular-learning/issues/184

bradenmacdonald commented 8 months ago

@pomegranited @yusuf-musleh @jmakowski1123 Could you please review the four scenarios I outlined in "Taxonomy Matching" above, and let me know your thoughts?

pomegranited commented 8 months ago

@bradenmacdonald

I think this covers all the scenarios.. But without a good way to communicate what happened to the user, they may be confused.

Scenario 3: Tags in the OLX do not match any Taxonomy (by Export ID):

  • Iff a Taxonomy is ever created/imported with the matching Export ID, the tags will "re-appear".

This one is problematic: the new invisible ObjectTags don't have anywhere to store the taxonomy Export ID, so they won't know to resync if a new Taxonomy gets imported.

One way to fix this is to change ObjectTag to use Taxonomy.export_id as a "loose" foreign key to Taxonomy (with db_constraint=False), so we could store the Export ID and use it to inform ObjectTag.resync when a new matching Taxonomy is created.

Scenario 4: Tags in the OLX match a Taxonomy (by Export ID) but the current user does not have permission to view/use that taxonomy [because it's linked to a different org]:

do we ignore these, or import them invisibly? If we import them and link them to the Taxonomy/Tag, will they be hidden or visible?

If we ignore the org issue and link them to a Taxonomy/Tag anyway, they will show up in Studio, but the user may not be allowed to delete them or add more tags from that Taxonomy.

I don't think that is a bad thing -- content authors can then work with their Taxonomy Admins to add org owners to the Taxonomy (or make it available to all orgs).

  • Such tag(s) should not be visible in Studio, but iff the Taxonomy is ever updated to include that tag value, the tags will "re-appear".

It's worth mentioning that this "re-appearing" is handled by ObjectTag.resync, which is called when individual tags are added or updated via the tagging API. But I'm not seeing where it's called during bulk tag import? E.g. actions.CreateTag does not use the api to create Tags, and so it looks like this step got missed.

yusuf-musleh commented 8 months ago

@bradenmacdonald Scenario 1 and 2 look fine to me.

For scenario 3, in the case where some (or all) of the Tags with the same values exist, but the export_id does not match, do we want to map those Tags that have a matching value to the ones in the existing taxonomy and treat them as if they belong to it? Or are we treating the export_id as the source of truth for all importing/exporting regardless of tag value? If it's the later, that might cause issues with conflicts if we try to create new invisible tags, since the tag value should be unique in the DB across all taxonomies

For scenarios 4, building off Jill's point, if we allow those tags to be imported and shown in the UI, maybe we can add some sort of special distinction or indication on those tags (like a different color + tool tip) that informs the user that these tags were applied, however you do not have permissions to edit/remove tags for this taxonomy. They could then handle it internally with their Taxonomy Admins.

bradenmacdonald commented 8 months ago

@pomegranited

Scenario 3... This one is problematic: the new invisible ObjectTags don't have anywhere to store the taxonomy Export ID, so they won't know to resync if a new Taxonomy gets imported.

Oh, I mentioned that above. When we import the tags, we won't actually have a name for the taxonomy, just the Export ID. So when we create the ObjectTag, we can put the Export ID as the _name of the ObjectTag. Or perhaps we can just rename the _name field of ObjectTag to be _export_id, since this matching of taxonomies is exactly what the _name field is for anyways. Right?

If we ignore the org issue and link them to a Taxonomy/Tag anyway, they will show up in Studio, but the user may not be allowed to delete them or add more tags from that Taxonomy. I don't think that is a bad thing -- content authors can then work with their Taxonomy Admins to add org owners to the Taxonomy (or make it available to all orgs).

That makes sense to me :)

It's worth mentioning that this "re-appearing" is handled by ObjectTag.resync, which is called when individual tags are added or updated via the tagging API. But I'm not seeing where it's called during bulk tag import? E.g. actions.CreateTag does not use the api to create Tags, and so it looks like this step got missed.

Yeah it's not something we currently do. But we can add that in as part of this ticket.

bradenmacdonald commented 8 months ago

@yusuf-musleh

For scenario 3, in the case where some (or all) of the Tags with the same values exist, but the export_id does not match, do we want to map those Tags that have a matching value to the ones in the existing taxonomy and treat them as if they belong to it?

The values could match zero, one, or many taxonomies but there's no way to know if they're related or not. I want to treat Export ID as the source of truth and not try to "guess" which taxonomy it should be.

since the tag value should be unique in the DB across all taxonomies

Not sure why you're saying that. Values are only unique within a taxonomy.

For scenarios 4, building off Jill's point, if we allow those tags to be imported and shown in the UI, maybe we can add some sort of special distinction or indication on those tags (like a different color + tool tip) that informs the user that these tags were applied, however you do not have permissions to edit/remove tags for this taxonomy. They could then handle it internally with their Taxonomy Admins.

You can still remove the tags from the object, if you want. You just can't edit the taxonomy. Which is fine, since you can't edit taxonomies from the tagging drawer anyways. In fact, we currently don't let you edit taxonomies at all besides via import :p.

What I'm trying to say is that as far as the tagging drawer goes, these tags will look like any others, except you won't be able to add additional tags from the same taxonomy to your object. But everything else will look and behave the same.

yusuf-musleh commented 8 months ago

@bradenmacdonald

I want to treat Export ID as the source of truth and not try to "guess" which taxonomy it should be.

Got it, that makes sense.

Not sure why you're saying that. Values are only unique within a taxonomy.

Oh right! I forgot they're unique within the taxonomy, sorry for the confusion there.

these tags will look like any others, except you won't be able to add additional tags from the same taxonomy to your object

Got it, that sounds good to me 👍

ChrisChV commented 7 months ago

@bradenmacdonald I will stop the work here because of this discussion. I think the scope will change.

bradenmacdonald commented 7 months ago

@ChrisChV Thanks. I do think that the scope will change, yep. Just need a bit more time to figure it out.

bradenmacdonald commented 6 months ago

@yusuf-musleh Could you edit this story to replace the Acceptance criteria with these new ones? I can't edit it since you created it.

NEW Acceptance Criteria

  1. When a course with tags is exported to a .tar.gz file, it includes a new tags.csv file with the course tags.
    • This should be exactly the same format as already implemented in https://github.com/openedx/edx-platform/pull/34091
    • If neither the course itself nor any items in the course have tags, the tags.csv file is not included. (Probably use get_object_tag_counts() to determine this very efficiently.)
    • The format of the the tags.csv file will have to be changed slightly:
      • the "Type" column will have to be the block type (e.g. vertical not "Unit" etc.)
      • the "ID" column should just include the block_id e.g. 1238971 or unit7 not the whole usage key (block-v1:...). This is because the course export format is independent of the course ID.
      • These changes should be made to all versions of the "export tags CSV" code (don't just make these changes for course export).
  2. When a course is imported, if the import contains a tags.csv file, then after the content is imported, all the tags will be applied using the CSV. Invalid block IDs are ignored.
  3. Taxonomy Matching: (same as existing plan)
yusuf-musleh commented 6 months ago

@bradenmacdonald Updated the acceptance criteria.

ChrisChV commented 6 months ago

@yusuf-musleh

  1. Taxonomy Matching: (same as existing plan)

This part has been deleted in the description of the PR, the Taxonomy Matching scenarios. Is there a way to recover it?

bradenmacdonald commented 6 months ago

@ChrisChV Yes, to recover it do this:

Screenshot 2024-03-12 at 10 30 03 AM

Screenshot 2024-03-12 at 10 31 24 AM (ignore the part in red)

bradenmacdonald commented 6 months ago

@feanil I used to be able to edit this issue (as you can see in the edit history), but now I can't. I still have the ability to close it though, even though I didn't create it.

I'd also like to rename milestones like https://github.com/openedx/modular-learning/milestone/5 which I used to be able to do but now I can't.

Was this an intentional change to my permissions or is something configured wrong?

yusuf-musleh commented 6 months ago

@ChrisChV Sorry about that! I did it in a hurry and didn't realize I was supposed to keep that part as is. I've updated the description again.

Thanks @bradenmacdonald for stepping in!