Import/Export Mapping Logic and a Dedicated CLI Client?

openedx-unsupported / blockstore

Open edX Learning Object Repository

GNU Affero General Public License v3.0

15 stars 20 forks source link

Import/Export Mapping Logic and a Dedicated CLI Client? #19

Closed ormsbee closed 2 years ago

ormsbee commented 5 years ago

@pdpinch, @JAAkana, @bradenmacdonald, @symbolist, @pomegranited:

Extracted out of issue #16: What is the ideal course import/export format, keeping in mind that we will have the ability to associate assets with individual leaf XBlocks.

We know that a number of technically advanced course teams generate OLX from existing banks of problems or do all their authoring in XML and use import/export as their primary publishing workflow. However, there are some other use cases we also need to keep in mind...

Other Known Use Cases

A list of known import/export use cases compiled by @JAAkana:

Rerunning (v1)

A course team created an empty Studio ‘shell’ for their rerun months ago, and they’re ready to finally load the content of the current run into that shell today (they’d copy over their current course automatically if they could).

Rerunning (v2)

Course team may have originally created a rerun via auto-copying their current run, but some other version of their course turned out to be better.

Backups

I’m going to do something risky to my course and I need a backup copy - the import needs to be exactly the same to preserve location-ids / access to student data

Course division / course chimera (rare)

I’ve run one giant 18-week course, and I’d like to split it into 3 small courses without copy/pasting everything. Or, I’ve run 3 small courses and I want to repackage them as 2 new courses.

Moving content between multiple instances of the platform

Most commonly, this involves moving courses from test runs on Edge to MOOC runs on edX.org. Some course teams also move content from their own instances to edX controlled ones and vice versa.

Libraries

Teams want to use their MOOC problem banks on campus or vice-versa

XML editing outside of Studio

Conditional modules, changing a course’s wiki slug, adding user-readable unit URLs, etc. These are sometimes edits that either cannot be supported by the Studio UX at all (because no appropriate handler was written for them), or else are really cumbersome to do with existing editors (e.g. course-wide search and replace).

Retrieving files (rare)

I’ve uploaded a bunch of assets to Studio months ago and now I want copies of them -it is easier to get them via export than by clicking on each.

Retrieving data (rare)

I want a bunch of info from my course that’s hard to find click-wise, so I’ll export it and use the xml: e.g. ‘what are all the youtube ids? Where did I say ‘week’ instead of ‘lesson’ as I update to convert to self-paced’?

Seeding a new course (rare)

I have a introductory sequence that I’d like to appear in a lot of my courses - I’ll import this content and then build the rest of my course.

Translating from Another File Format

Converting content from another format (latex, markdown) into a format edX can consume (could use machine or human-readable format). Open edX lives in a ecosystem and it's not unusual for folks to wanted to convert to and from its format into others.

ormsbee commented 5 years ago

@pdpinch: I would love to get your thoughts to kick us off on this one, since you folks work so closely with this.

pdpinch commented 5 years ago

@ormsbee I'm a bit confused about the relationship of bundles, collections and courses. Is there a sketch of this somewhere?

To kick off the discussion, I was thinking that exporting a course would yield a group of bundles -- each a directory containing an OLX file and a sub-directory of static assets. However, with ownership metadata in the collection, I'm not sure if this is what you had in mind?

ormsbee commented 5 years ago

@pdpinch: Collection:Bundle is 1:M, and let's assume for now that there will be one Collection that holds all the Bundles in a typical course, but that borrowed content might come from Bundles in other Collections that represent Content Libraries.

There's a lot of conversation in #16 in terms of what that breakdown of Bundles would look like. But for the purposes of this thread, I'd rather not start with the Bundle mapping. Let's assume that there's going to be some amount of data transformation between the import/export format and the representation of the pieces in Blockstore.

I think this represents a bit of a shift in mindset. Previously, Blockstore data at-rest was envisioned to be the author-intent editing bundles of files. But the direction of #16 looks like we're talking more about Blockstore data at-rest being in a format that is more about facilitating re-use (so more granular, more regular). Which pushes more burden to a conversion/mapping import-export layer between the two. But I think that once we're committing to such a layer existing, we get a lot more freedom in defining exactly what the export format looks like irrespective of the storage details.

Does that make sense?

ormsbee commented 5 years ago

So I think the semantics we have to worry about are:

How do we associate static assets with individual leaf XBlocks (e.g. capa problem, HTML block) -- probably a similar mechanism for precursor files as well?
How do we call out things that need to be shared within the course itself (e.g. Python code libraries).
How do we distinguish in the export the content that you can edit vs. the content that you're borrowing from elsewhere and may have a read-only view of.
What does the roundtrip look like (e.g. do we have a canonical export format and more freeform import format)?
How do we cleanly separate policy-related items from content in a way that's simple, extensible, and doesn't look terrible?

Keep in mind that it doesn't have to be strictly OLX as we know it today.

pdpinch commented 5 years ago

There are some things that have been good about OLX -- HTML concepts are familoar to course authors; XML isn't a big leap; for the most part the block names are self-describing. The problems with OLX have most been structural, and with naming. The course is tree-like, but the export is not. It's hard to find things. It's nearly impossible to move or copy anything but the leaf nodes. Copying static assets is similarly difficult.

• How do we associate static assets with individual leaf XBlocks (e.g. capa problem, HTML block) -- probably a similar mechanism for precursor files as well?

I'd suggest a static directory in association with leaf xblock. It would be nice, for example, if opening a HTML block in a browser would just render, although that may be too much to ask. Precursor files need some association with their representation. For latex, there’s a convention for referencing the orignal file as an XML attribute on the node. (need example)

• How do we call out things that need to be shared within the course itself (e.g. Python code libraries).

This is an interesting question. The fact that python libraries are shared across the course is a convenience, but they are referenced more like static files, at the leaf node. So one response is that they should be handled just like other static assets — although the current references (from mitxgraders import FormulaGrader) are going to be hard to parse.

I think an even more interesting question is: how does the export reference reused assets (static, or otherwise). I’d like something that gives me all the data in an export, but otherwise avoids duplication. This should be the goal in the export, as well as in the blockstore after an import.

• How do we distinguish in the export the content that you can edit vs. the content that you're borrowing from elsewhere and may have a read-only view of.

metadata of some kind. Are there any existing conventions?

• What does the roundtrip look like (e.g. do we have a canonical export format and more freeform import format)?

What’s feasible here? The canonical export, freeform import of the current OLX is useful, but I don’t know how much of that was deliberate and it strikes me as difficult ot maintain.

• How do we cleanly separate policy-related items from content in a way that's simple, extensible, and doesn't look terrible?

We probably need to start by identifying what are policy related items. I presume though that you are talking about dates, grading policy, etc. Much as I’d like to institute a strict separation of these, there are use cases that we wouldn’t want to break — like exporting the course, increasing the dates by a year, and then importing it.

This is a rambling response, but you asked a lot of questions. Let me know if this is helpful, or if you'd prefer something more carefully though out.

ormsbee commented 5 years ago

This is an interesting question. The fact that python libraries are shared across the course is a convenience, but they are referenced more like static files, at the leaf node. So one response is that they should be handled just like other static assets — although the current references (from mitxgraders import FormulaGrader) are going to be hard to parse.

If that's going to be the case, then we'll need something else to distinguish between static assets that users can download and ones that are for internal usage. Though maybe that doesn't have to happen at this layer...?

How do we distinguish in the export the content that you can edit vs. the content that you're borrowing from elsewhere and may have a read-only view of.

metadata of some kind. Are there any existing conventions?

We're making this up as we go along, but I think it might be better to have a place where they're cleanly separated in the file system, especially if there's hand-editing going on. It's one thing to check a metadata JSON file, it's another to be editing a path called /readonly_borrowed_content/{usage_key}.xml (I realize that is an absolutely horrible name -- I merely include it here for the obviousness).

I think an even more interesting question is: how does the export reference reused assets (static, or otherwise). I’d like something that gives me all the data in an export, but otherwise avoids duplication. This should be the goal in the export, as well as in the blockstore after an import.

Yeah, I think we're going to have to give reused content its own explicit, top level space. But not sure beyond that.

What does the roundtrip look like (e.g. do we have a canonical export format and more freeform import format)?

What’s feasible here? The canonical export, freeform import of the current OLX is useful, but I don’t know how much of that was deliberate and it strikes me as difficult to maintain.

I think some level of freeform import is feasible as long as the path to a canonical format is simple. For instance, say the import mechanism always expected a giant XML course file as the contents of the entire course. For hand authoring, that could look like this:

<course>
    <chapter id="blockstore">    <!-- id means "usage_key" here -->
        <!-- 
            The following could be inline or XIncludes of sequence files.
            Since usage_key is derived from the XML within the file, there's
            no restriction that the filename has to be the usage key, though
            it might be a nice convention.
         -->
        <xi:include href="blockstore/early_modulestore_history.xml" />
        <xi:include href="blockstore/requirements.xml" />
        <xi:include href="blockstore/storage_granularity.xml" />
        <xi:include href="blockstore/data_transforms.xml" />
        <xi:include href="blockstore/import_export.xml" />
    </chapter>
    <!-- etc... -->
</course>

The xi:includes is an XML convenience and not OLX parsing. So the import code would only see the course.xml after all the includes have been processed, and then it can do its validation and separation into discrete components based on that. That would give hand authors a fair amount of structural flexibility without significantly complicating the import code.

I'm still really fuzzy on how to do static assets in a good way though.

I'd suggest a static directory in association with leaf xblock. It would be nice, for example, if opening a HTML block in a browser would just render, although that may be too much to ask.

That would be pretty cool, though I'm not sure how that interacts with the desire to allow flexibility of placement on the authoring side. Most blocks aren't going to have any associated static assets at all. Though maybe we can keep a bit of both by allowing XML placement to be freeform but have conventions that all static assets are at the root of the export and sub-divided by something derived from the usage key:

course.xml
static/
       html/
            compositor_overview
                               /diagram.png

Since every leaf has to have some kind of usage key identifier (in some way or another), it's straightforward to know where the assets are going to be, regardless of how the XML is structured. Opening an HTML file wouldn't render the images correctly without a little post-processing, but I think that's still an acceptable tradeoff.

We probably need to start by identifying what are policy related items. I presume though that you are talking about dates, grading policy, etc. Much as I’d like to institute a strict separation of these, there are use cases that we wouldn’t want to break — like exporting the course, increasing the dates by a year, and then importing it.

Yup, those are exactly what I had in mind. Braden created a Compositor Architecture Proposal a while back, to map out where those values get applied. I think that supporting them as part of the import/export flow is feasible, so long as they're:

separated out cleanly in files that are separate from the content XML definition.
in the long term, treated as inputs to policy services, and not the same way as content.

The second of those is a long held rant of mine, but I basically think that there needs to be a separate system for Scheduling that has the course staff set dates as inputs, but also has other inputs like individual due date extensions, and an ability to query in a cross-course manner. But that's a bit of a tangent. I think it's enough to say that it's different enough with our current known use cases (re-runs, CCX, content libraries) that it deserves clearer separation in the data model than exists today.

This is a rambling response, but you asked a lot of questions. Let me know if this is helpful, or if you'd prefer something more carefully though out.

I think at this point in our discussion, low latency rambling is more valuable than high latency proposals. :)

symbolist commented 5 years ago

I was thinking about these use cases and can see them falling into 3 categories:

Use cases which are eliminated by the versioning and sharing capabilities of Blockstore.
Use cases for transferring content. The format for this doesn't need to be human friendly since only machines will read/write it. Lets call it MF (machine-friendly) format.
Use cases for out-of-Studio content authoring. Lets call it HF (human-friendly) format.

Which pushes more burden to a conversion/mapping import-export layer between the two. But I think that once we're committing to such a layer existing, we get a lot more freedom in defining exactly what the export format looks like irrespective of the storage details.

The way I have been thinking about 3 is how programming languages work. For example, there is a Python runtime which expects the data to be in a certain format in memory. And then there is a parser/compiler layer which gives humans a lot of freedom to organize the code the way it makes sense for them. So one question I have is (especially for @pdpinch), what if Blockstore had APIs and there was a command line tool that interacted with them? So it could, lets say, pull out the OLX and static files it needed and depending on the internal structure, write them out into the local file system the way it made sense for that type of content. For example, the filenames could become <unit_display_name>_<block_display_name>.id.olx. Or the olx files could be organized into a separate directory for each chapter or chapter (this could even be configurable). Similarly, the tool could re-read this data, run validations and push the changed parts to Blockstore. Heck, it could even have a watch option, which would on every edit push the content to the devstack Blockstore and let you preview things in Studio/LMS automatically. In other words, everything that is possible with Webpack.

(Of course API calls would be rate-limited by user)

Blockstore data at-rest was envisioned to be the author-intent editing bundles of files.

I do think this is still mostly true for the idea above. The main difference is that instead of the author running tar -zxvf export_file to see the directory of content, they run blockstore pull collection_id --export-format-configuration=<config>. And since the code for the transformer is going to be outside Blockstore it can evolve much faster.

Is this too radical an idea? What are the downsides? We will of course still need MF format for the category 2 of use cases.

Use Cases

Here is which categories I think these use cases fall in:

Rerunning (v1) A course team created an empty Studio ‘shell’ for their rerun months ago, and they’re ready to finally load the content of the current run into that shell today (they’d copy over their current course automatically if they could).

MF format should be sufficient though they should just be able to point to a course.xml version of a fork in Blockstore.

Rerunning (v2) Course team may have originally created a rerun via auto-copying their current run, but some other version of their course turned out to be better.

MF format should be sufficient though they should just be able to point to a course.xml version from a fork.

Backups I’m going to do something risky to my course and I need a backup copy - the import needs to be exactly the same to preserve location-ids / access to student data

No need since can always go back to an older version of the course directly.

Course division / course chimera (rare) I’ve run one giant 18-week course, and I’d like to split it into 3 small courses without copy/pasting everything. Or, I’ve run 3 small courses and I want to repackage them as 2 new courses.

No need since can just combine the chapters and sequences into a new course.

Moving content between multiple instances of the platform Most commonly, this involves moving courses from test runs on Edge to MOOC runs on edX.org. Some course teams also move content from their own instances to edX controlled ones and vice versa.

MF format. Though I think the ability to link an edX app instance to other Blockstores may be a better solution for this.

Libraries Teams want to use their MOOC problem banks on campus or vice-versa

Same as previous point.

XML editing outside of Studio Conditional modules, changing a course’s wiki slug, adding user-readable unit URLs, etc.

Would be simpler to have a file editor in Studio.

Retrieving files (rare) I’ve uploaded a bunch of assets to Studio months ago and now I want copies of them -it is easier to get them via export than by clicking on each.

MF format should be sufficient?

Retrieving data (rare) I want a bunch of info from my course that’s hard to find click-wise, so I’ll export it and use the xml: e.g. ‘what are all the youtube ids? Where did I say ‘week’ instead of ‘lesson’ as I update to convert to self-paced’?

The HF format may be useful here but this is a rare use case.

Seeding a new course (rare) I have a introductory sequence that I’d like to appear in a lot of my courses - I’ll import this content and then build the rest of my course.

No need since can just link to those sequences in multiple courses.

pdpinch commented 5 years ago

@symbolist distinguishing between human-friendly and a machine-friendly formats is useful and, I think, consistent with what @ormsbee was suggesting. I am also in favor of having APIs for importing and exporting elements -- we've wanted that for some time and I know of 3 (no, 4!) different ways folks have hacked that together.

I think you're misunderstanding one of the use cases, and the list is missing another.

"XML editing outside of Studio" isn't typically about editing a single XML file. It's about doing some kind of manipulation that isn't possible in Studio. I suppose if you added a file editor that would probably cover some of the use cases, but certainly not all. Editing outside of studio could use a MF format, but I think a HF format would be better.

The use case that is missing from this list is converting content from another format (latex, markdown) into a format edX can consume (could use MF or HF format). edX lives in a ecosystem and it's not unusual for folks to wanted to convert to and from its format into others.

pdpinch commented 5 years ago

Sorry, I'm replying of order.

@ormsbee:

regarding course-wide static assets like python libraries

we'll need something else to distinguish between static assets that users can download and ones that are for internal usage.

I don't understand this. I think the python grading library should be downloaded just like other static assets. I wouldn't expect to be able to run it, except after uploading it to an instance of edx-platfom, but otherwise I see it as just another static asset.

regarding the export of shared files

it might be better to have a place where they're cleanly separated in the file system

I think we could live with that. I'd still like HTML to render locally if possible, but there are ways to make that work even with "/readonly_borrowed_content/{usage_key}.xml"

regarding the placement of static assets

maybe we can keep a bit of both by allowing XML placement to be freeform but have conventions that all static assets are at the root of the export and sub-divided by something derived from the usage key:

That's certainly clear for finding static assets. How would (manual) reuse work though? I copy the OLX I want out and then grab a copy of the corresponding static folder?

regarding policy related items

I'll go read up on the Compositor Architecture but I think separating policy from content will be fine. Folks are already accustomed to the separation between the OLX and the policy.json.

ormsbee commented 5 years ago

@symbolist:

I do think this is still mostly true for the idea above. The main difference is that instead of the author running tar -zxvf export_file to see the directory of content, they run blockstore pull collection_id --export-format-configuration=. And since the code for the transformer is going to be outside Blockstore it can evolve much faster.

Is this too radical an idea? What are the downsides? We will of course still need MF format for the category 2 of use cases.

That's really interesting. I had a tiny blurb in the original design doc that having a CLI Blockstore utility to manage downloads might be necessary in order to tie all the links together and present it in a usable way. But I envisioned that as a generic Blockstore utility without deep awareness of what course content is. If I'm understanding correctly, this goes a step beyond that and proposes that the mapping logic of how to translate local files to Blockstore moves to the client, translating user intent into relatively low level Blockstore operations. I think this has some some strong arguments in both directions, and I'd really like to pursue this line of thinking.

Things I love about it:

It shifts knowledge of this out of Blockstore, simplifying the core.
It offers a lot of opportunity for a rich, high powered tool that really assists you in the offline authoring experience at various levels of granularity. This is something that actually sounds like a lot of fun to write.
Said tool would likely be a good place for firming up the definition of OLX.
It lets us tweak what things are downloaded and what isn't -- very important for skipping videos, but also useful for smaller edits.
Keeping the orchestration layer on the client simplifies our async task requirements (since pulling a giant tar.gz file and operating on it over a span of minutes has a bunch of places where things can go wrong).
There's a certain symmetry about it, because it's analogous to Studio as a client.

Things that concern me:

The mapping of a Course to Blockstore data constructs needs to be elevated to an API contract, making it more difficult to change our minds about data modeling at a later point. Maybe that was inevitable anyway, if we wanted to treat Blockstore as a first class interface for manipulating this data rather than an implementation detail for Studio to do so.
I worry that the client code won't be maintained, or that we'll see drift between it and what's run by Studio. I imagine we would want Studio to use this library to some extent to help prevent that, but I still see drift as a concern.
Maintaining a program that has to work locally on everyone's machine is going to be a maintenance issue. There's always someone running it on random distro with weird restrictions, and of course it should work on my Mac but which of the eight Pythons is it really hooked into, and "hey, lxml isn't compiling because it can't find my libxml2 lib", etc.

Smaller details:

It probably means we have to expose more low level snapshot manipulation in the API. For instance, if you're changing eight Bundles, you probably want to make incrementing BundleVersions of those an atomic operation from the point of view of the CLI. Which we have a lever to do -- we can create Snapshots one at a time but then make all the BundleVersion pointer updates to said Snapshots atomically in one transaction. But it just means more of that will have to be exposed in the API.

@pdpinch:

I don't understand this. I think the python grading library should be downloaded just like other static assets. I wouldn't expect to be able to run it, except after uploading it to an instance of edx-platfom, but otherwise I see it as just another static asset.

Yeah, I'm conflating shared Python libs with custom response Python code that might leak answers. But the latter doesn't need to be treated as a static asset, so please disregard.

That's certainly clear for finding static assets. How would (manual) reuse work though? I copy the OLX I want out and then grab a copy of the corresponding static folder?

Can you describe in more detail the specific re-use scenario here? I just want to make sure I understand the question.

pdpinch commented 5 years ago

@ormsbee I opened a new issue for "re-use of exported content"

I'm tempted to do the same for "blockstore import/export client" but maybe you want to let that take over this thread.

symbolist commented 5 years ago

@pdpinch

I am also in favor of having APIs for importing and exporting elements -- we've wanted that for some time and I know of 3 (no, 4!) different ways folks have hacked that together.

Ah, interesting!

I think you're misunderstanding one of the use cases, and the list is missing another.

"XML editing outside of Studio" isn't typically about editing a single XML file. It's about doing some kind of manipulation that isn't possible in Studio. I suppose if you added a file editor that would probably cover some of the use cases, but certainly not all. Editing outside of studio could use a MF format, but I think a HF format would be better.

The use case that is missing from this list is converting content from another format (latex, markdown) into a format edX can consume (could use MF or HF format). edX lives in a ecosystem and it's not unusual for folks to wanted to convert to and from its format into others.

Oh, I am aware of these. The list above is from another document which was compiled some time ago and I thought the point in the list was only about a more restricted version of the idea. But I may have read it incorrectly so thanks for detailing these out!

@ormsbee

If I'm understanding correctly, this goes a step beyond that and proposes that the mapping logic of how to translate local files to Blockstore moves to the client, translating user intent into relatively low level Blockstore operations.

Yup! As you said it would be kind of like a command line version of Studio!

I do want to add one point to your very nice list of positives and that is evolvability. One of the things about putting the transform logic for HF format in Blockstore is that it would also have to double as the MF format that would be needed for transport and archival purposes and therefore must be stable and that would mean the HF format would be for a long time (close to) what we decide in the next few months. Like Studio, if the HF format is allowed to evolve freely as feedback comes in with usage, without any concern for backwards compatibility ("just update to new version of client and pull the collection again") and as new types of content get developed, it would end up being a lot more author-friendly.

Things that concern me:

The mapping of a Course to Blockstore data constructs needs to be elevated to an API contract, making it more difficult to change our minds about data modeling at a later point. Maybe that was inevitable anyway, if we wanted to treat Blockstore as a first class interface for manipulating this data rather than an implementation detail for Studio to do so.

I worry that the client code won't be maintained, or that we'll see drift between it and what's run by Studio. I imagine we would want Studio to use this library to some extent to help prevent that, but I still see drift as a concern.

Hmm. These two will definitely need thinking.

Maintaining a program that has to work locally on everyone's machine is going to be a maintenance issue. There's always someone running it on random distro with weird restrictions, and of course it should work on my Mac but which of the eight Pythons is it really hooked into, and "hey, lxml isn't compiling because it can't find my libxml2 lib", etc.

There are tools for converting packages to single file executables with the interpreter + dependencies included. Would something like that help?

ormsbee commented 5 years ago

@pdpinch:

I'm tempted to do the same for "blockstore import/export client" but maybe you want to let that take over this thread.

Yeah, I think it's okay to let it take over the thread. If we land on consensus that we want a rich client like that, then a new issue makes sense to hammer out some more of the specifics.

@symbolist:

There are tools for converting packages to single file executables with the interpreter + dependencies included. Would something like that help?

Probably? I don't know the state of the art on that these days for the Python world. The one place I worked at that needed to tackle these issues eventually gave up on Python and rewrote their agent in Go so that they could compile a static binary and be done with it. Which is not something I'm advocating for, fwiw.

ormsbee commented 5 years ago

Okay, I think that after having the Thanksgiving weekend to stew on this, I'm +1 on the dedicated CLI client, and pushing the mapping logic to Bundle primitives out of Blockstore.

symbolist commented 5 years ago

@bradenmacdonald In case you haven't been following this, can we get your opinion too?

@ormsbee I see you had productive holidays! 😄

bradenmacdonald commented 5 years ago

@pdpinch @ormsbee @symbolist

I really like the idea of having a CLI tool for working with a local HF format and syncing it with Blockstore, keeping the Blockstore format relatively efficient for developers+reuse+writing (and eventually yet a third read-optimized format when pushing from Studio/Blockstore to the LMS...?)

I was going to suggest that the CLI tool be written in Go or Rust so that it's (a) even more fun to write, and (b) much more portable, but that would make sharing code with Studio much more difficult. I do personally like the approach of writing a simple library in C or Rust with nice bindings for Python (à la libgit2), to keep a consistent approach. However, I see you've mentioned those ideas already. It probably makes sense to stick to Python here throughout the stack, but it's not what I personally would gravitate toward.

Another approach that I was thinking of which can help less technical users is to use an online IDE for viewing+editing+syncing the HF format. If we found some existing one like GitLab's which could be adapted really easily (so you just use it as-is with a little layer to sync to/from Blockstore), that would be a huge win. Then you get the same effect, but people don't need to install any software on their computer. I don't know if that's a medium-sized project or a mammoth one though.

The course is tree-like, but the export is not. It's hard to find things.

Yeah I definitely want to see the HF format having a hierarchy that matches the course. My suggestion would be something like:

/course.xml
/policies.json
/chapter1.xml
/chapter1/unit1-1.xml
/chapter1/unit1-1/intro-to-the-course.xml
/chapter1/unit1-1/first-problem.xml
/chapter1/unit1-1/first-problem/some-image.svg
/chapter1/unit1-1/first-problem/some-other-image.svg
/static/some-image-used-throughout-the-course.svg
/static/python-shared.zip

This structure avoids the things I hate about ansible: it doesn't require creating subdirectories unless they're needed (i.e. blocks without static assets don't need their own directory), and it avoids lots of similarly named files throughout the tree (an editor with a dozen "unit.xml" or "index.olx" tabs open is annoying).

For the shared static assets, it might be reasonable to actually symlink them into the folder where they are used, in order to track that better. After all, all operating systems today support symlinks, including windows. For people who don't know how to create symlinks manually, they can just copy the files and the import/sync process will dedup them.

Any item that includes a read-only link to a child from another bundle should probably not pull that child down in most cases, so e.g. if chapter1.xml contains a link to a unit2 from another bundle, then chapter1.xml could just contain a reference to that external bundle, but /chapter1/unit2/ would not even appear as a directory. Alternately, since all OSs support creating read-only files, the CLI tool could pull them down like this:

/chapter1.xml
/chapter1/unit2.xml -> /external/{bundle_uuid or alias}/unit.xml
/external/{bundle_uuid or alias}/unit.xml (read-only)
/external/{bundle_uuid or alias}/unit/image.svg (read-only)

ormsbee commented 5 years ago

I was going to suggest that the CLI tool be written in Go or Rust so that it's (a) even more fun to write, and (b) much more portable, but that would make sharing code with Studio much more difficult. I do personally like the approach of writing a simple library in C or Rust with nice bindings for Python (à la libgit2), to keep a consistent approach. However, I see you've mentioned those ideas already. It probably makes sense to stick to Python here throughout the stack, but it's not what I personally would gravitate toward.

Yeah, don't get me wrong, I really want to write it in Rust because Rust is fun and it's what my hobby command line utilities are written in. But I'm much better at Python than Rust, and it doesn't give such a huge advantage over Python that it justifies the added burden of a new language that will be unfamiliar to most Open edX developers.

Another approach that I was thinking of which can help less technical users is to use an online IDE for viewing+editing+syncing the HF format. If we found some existing one like GitLab's which could be adapted really easily (so you just use it as-is with a little layer to sync to/from Blockstore), that would be a huge win. Then you get the same effect, but people don't need to install any software on their computer. I don't know if that's a medium-sized project or a mammoth one though.

That sounds cool, but yeah, potentially a lot of work. Sounds like a great hackathon project though.

/chapter1.xml /chapter1/unit1-1.xml /chapter1/unit1-1/intro-to-the-course.xml /chapter1/unit1-1/first-problem.xml /chapter1/unit1-1/first-problem/some-image.svg /chapter1/unit1-1/first-problem/some-other-image.svg

I'm not sure if I completely understand... In this scenario, can you please give an example of what the contents of chapter1.xml and unit1-1.xml might look like? And in particular, what they do with small blocks that don't necessarily have static assets, like html, discussion, or conditional?

For the shared static assets, it might be reasonable to actually symlink them into the folder where they are used, in order to track that better. After all, all operating systems today support symlinks, including windows. For people who don't know how to create symlinks manually, they can just copy the files and the import/sync process will dedup them.

I think I made some blurb in the original design docs about potentially using symlinks to stitch together linked Bundles on the client side, so I get where you're coming from. But at the same time, I'm skittish on symlinks in general. Even if they're supported on Windows 10, there are a lot of places where people can get tripped up with compatibility issues. For instance, I spent a while spinning my wheels on the path to getting edx-platform to work on Windows because I didn't realize that the git client for Windows by default doesn't create symlinks (it's disabled in configuration). File watching utilities sometimes do or don't follow symlinks. Folks who use Cygwin on Windows are either creating real, native Windows symlinks or it's own specially formatted files that emulate symlink behavior (created in the dark days before Windows really supported it). Run the wrong Python and you have a really annoying-to-debug issue.

Also, a lot of folks really don't get what it means and are confused that changes in one directory affect the other. Or some people get really symlink happy and you end up with symlink spaghetti going back and forth from your static dir to the leaves and back again. So if we do use them in the human-friendly format, I'd like to be fairly strict and limited in how they're allowed to be used.

bradenmacdonald commented 2 years ago

This is old and most likely no longer relevant to the future platform direction, so I'm going to close it. The discussion will be preserved on GitHub for future reference if useful.