pipalacademy / riyaz

A light-weight, self-hostable learning platform.
MIT License
2 stars 0 forks source link

Management of course assets #26

Closed nikochiko closed 2 years ago

nikochiko commented 2 years ago

Course will have assets like instructor photo, and other custom assets that are loaded into markdown by user.

We need to save that in a separate directory that can be served by Flask or another file server.

nikochiko commented 2 years ago

Approach:

Like we have a db.py, we should have an assets.py that will take configuration variables from config.py and use that to save assets.

Config variables needed:

To be implemented in assets.py:

  1. class Asset:
    - content: bytes  # could be a stream? but bytes generalizes better and we don't need to optimise too much
    - name: pathlib.Path  # path to the asset file inside `assets_path`, this is also the path after `assets_root`
    - save()  # write `content` to `{assets_path}/{name}` 
    - get_url()  # URL of asset, `{assets_root}/{name}`
  2. get_asset(name: str|Path) -> Optional[Asset]

We will need to implement abstractions to save assets at the correct location in other modules (disk.py).

save_author_photo(from_: Path) -> Path save_course_asset(from_: Path) -> Path

anandology commented 2 years ago

Thoughts

We should plan to support having an option to store assets in s3 in the future.

Right now we are having assets for courses, but in future we may have assets for a project as well. We may have to be ready to support that.

Assets path

Assets can be stored in a directory assets and that could be served as a static path. In production, that can be served by nginx or s3.

We have two options for the path of the assets.

For example, the file "circle.png" of course "joy" (course-id=1) could be stored at one of the following paths.

assets/1/circle.png
assets/joy/circle.png
assets/courses/1/circle.png
assets/courses/joy/circle.png

I like the last one better. It allows us to support other kinds of asset collections like projects. Using the key as part of the path might create trouble with renaming, so I'm okay to compromize on the readability of the path and go with option 3.

The authors photos are not really part of the course. They are only referenced from the course. So author photos could be saved as assets/authors/1/photo.png.

Assets in the DB

We could probably store assets with the following fields.

asset:
  - id 
  - collection ('authors', 'courses', etc.)
  - collection_id (author_id, course_id etc.)
  - filename
  - filesize
  - md5 (?)
  - created
  - last_modified

The Asset class can have get_url() method to return the url to serve the asset.

Configuration

Have assets_path, but leave out the assets_root for now. We'll see that when we get to s3 support.

Does this look alright? Did I miss anything?

anandology commented 2 years ago

It may be good idea to have a field deleted to mark an asset as deleted to make it faster to delete assets and purge them later. It may not be a good idea to delete the resources as part of the flow.

anandology commented 2 years ago

We may do something to make the assets play well with the cache. Name the files on disk as {md5}_{filename} and set the cache-control headers to cache them forever. That will reduce the number of resources the browser have to fetch for every page. What do you think?

nikochiko commented 2 years ago

Thanks.

For URL of course assets, I think it makes sense to use course ID like you suggested. It doesn't seem right to have to touch files on filesystem due to a simple rename.

I have a few questions:

We would need to decide how users specify their static assets.

The second option (assets/ directory) seems good to me. It is simple enough that user can paste something there and start using it without needing much explanation.

What do you think?

anandology commented 2 years ago
  • Why should we have assets in the db also? We get the same information from directory structure too.

Because you want to find all the assets that are attached to a course and finding that with disk access is not a good idea for every use.

  • If we name the files as {md5}_{filename}, would we set that filename while converting markdown to html?

I haven't thought about that one.

How do they let riyaz know which files need to be stored for static serving?

* An `- assets:` key in course.yml

* An `assets/` directory in course root

* We do magic and store filesystem paths as assets

* `riyaz add-asset /path/to/static/asset.png`

The second one - assets/ directory.

How do they use static assets in course text?

* `[hello world](/assets/asset.png)` (would be converted to `/assets/courses/1/asset.png` once it gets to a server)]

I completely overlooked this one. I don't think we'll be able to use assets that way. With mkdocs, the assets and the markdown urls are in the same paths even after the build. I'm not sure if we'll be able to do that.

In monschool-website, we use it as {{ Image("foo.png") }} to include an image. Let me think what is a good option for us.

anandology commented 2 years ago

mkdocs seems to be rewriting the images urls.

In an example that I tried:

![Python](assets/python.png)

generated:

<img alt="Python" src="../assets/python.png" />

So some rewrite is happening. I wonder what happens if I include a javascript or a css.

anandology commented 2 years ago

Here are some options that we can consider.

![](assets/foo.png)
![](${ASSETS}/foo.png)
![](${ASSETS_URL("foo.png")})
{{ Image(src="foo.png", alt="Foo") }}

The second option seems more flexible to be. If someone want to include a stylesheet or a script, they could use the same approach.

<script type="text/javascript" src="${ASSETS}/hello.js"></script>
<style type="text/css" href="${ASSETS}/hello.css"></style>
<img  src="${ASSETS}/hello.png" style="width: 100px">

The third option is even more flexible and allows use to rewrite the whole path with things like {md5}_{filename} discussed before, but that seems to be making it too ugly to use. So many quotes and parenthesis.

![](${ASSETS_URL("foo.png")})
<script type="text/javascript" src="${ASSET_URL('hello.js')}"></script>

We could even do the first option and automatically rewrite the URLs, but that approach won't work for scripts and other things. Even mkdocs fails in that case, so I'm not too much worried about it.

May be we can go with the first option and have javascript utility to get the asset urls. Something like:

riyaz.get_asset_url("courses", "joy", "style.css")

or

var course = riyaz.get_course("joy")
var url = course.get_asset_url("style.css")

What if someone wants to include just a HTML tag like:

<img src="assets/foo.png" style="max-height: 50px" />

There is no way to get the URL.

Which one did you like @nikochiko? Do you have any other suggestions?

nikochiko commented 2 years ago

I like the second or third option but that would make it not viewable on simple markdown readers (like on github). (Or would it be possible to do that with frontmatter or something?)

I think we should go with first option, and then replace the url in course loading step.

What if someone wants to include just a HTML tag like:

We can search for links in the rendered html instead. That should take care of this case.

anandology commented 2 years ago

Looking at mdbook extensions.

https://rust-lang.github.io/mdBook/format/mdbook.html

They seem to support this format:

{{ #include file.rs }}
{{#playground file.rs}}
{{#playground file.rs editable}}

I think that may be something might work well for us. It is better than {{ Image("foo.png")}}.

Here is my suggestion:

{{ #image foo.png }}
{{ #style style.css }}
{{ #script hello.js }}

We may even be able to support:

{{ #image foo.png style="max-height:50px" class="foo" }}
nikochiko commented 2 years ago

I don't think it's a good idea to give users alternate ways to do one thing. Adding an image or stylesheet is trivial in markdown and html.

{{ #playground }} and {{# include }} did something non trivial.

anandology commented 2 years ago

I don't think it's a good idea to give users alternate ways to do one thing. Adding an image or stylesheet is trivial in markdown and html.

Out of all the options that I've originally suggested, except the first one, the remaining three are alternate ways to do adding an image. Only thing that has changed is the format.

{{ #playground }} and {{# include }} did something non trivial.

That is correct, but I felt {{ #foo ...}} as an extension syntax seemed like a better idea than ${ASSETS_URL("foo.png")} or {{ Image(src="foo.png", alt="Foo") }}. Simple reason is that it has no quotes and no parathesis, making it easy to write and read.

I would prefer to have a single syntax for extensions irrespetive of the complexity of what it is doing.

nikochiko commented 2 years ago

Okay, I think for extensions, this syntax seems nicer than jinja or ${function()}.

I don't like that this won't allow us to read it as markdown without running a riyaz server, but we may need to use extensions later on too for exercises or embeds.

anandology commented 2 years ago

Yes, we could use the same syntax for examples and exercises.

{{ #example hello.py }}
{{ #example examples.py:square }}

{{ #exercise digit-count }}

And riyaz plugins can provide more extensions like this.

anandology commented 2 years ago

I don't like that this won't allow us to read it as markdown without running a riyaz server, but we may need to use extensions later on too for exercises or embeds.

We only changing the syntax for images using assets. That would be a tiny part of the whole document and I don't think we are are compromizing on experience of markdown.

nikochiko commented 2 years ago

We only changing the syntax for images using assets. That would be a tiny part of the whole document and I don't think we are are compromizing on experience of markdown.

True, yeah.

Yes, we could use the same syntax for examples and exercises.

Yup! That makes sense

anandology commented 2 years ago

Let's keep how markdown will handle/rewrite the asset URLs out of the scope of this issue.

Please implement the saving of assets part. Here is the plan from what we discussed above:

The optimization of using md5 in the filename for better cache-control can be taken up as a separate issue.

@nikochiko does this sounds like a plan?

nikochiko commented 2 years ago

Does this look like a good flow?

asset = db.Asset.new(collection=course, name="circle.png")
disk.save_asset(on_disk_path, asset.path)
anandology commented 2 years ago

How about:

asset = course.new_asset("circle.png")
asset.save_file(path)

On Fri, Aug 26, 2022, 11:41 AM Kaustubh Maske Patil < @.***> wrote:

Does this look like a good flow?

asset = db.Asset.new(collection=course, name="circle.png")disk.save_asset(on_disk_path, asset.path)

— Reply to this email directly, view it on GitHub https://github.com/pipalacademy/riyaz/issues/26#issuecomment-1228094255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAB3EPZOVZ5CPS2MWBRAJ3V3BNYTANCNFSM57N7T2VA . You are receiving this because you commented.Message ID: @.***>

nikochiko commented 2 years ago

Better!

I think we might need something lower level than file paths also, maybe streams or bytes, to handle for assets received over api. But we can take that up at a later time when we come to it.

anandology commented 2 years ago

It is always better to model the methods using relations rather than importing the class.

Always prefer course.get_module(name) than Module.find(course=course, name=name).

The first approach doesn't require any new imports, but the second one requires you to import Module.

nikochiko commented 1 year ago

Yes! Sounds like a good plan.

On Fri, Aug 26, 2022, 10:05 AM Anand Chitipothu @.***> wrote:

Let's keep how markdown will handle/rewrite the asset URLs out of the scope of this issue.

Please implement the saving of assets part. Here is the plan from what we discussed above:

  • assets will be saved as assets/courses/1/circle.png
  • there will be an entry for each asset in the db and the table structure for Asset is discussed above
  • there will be a new config setting assets_path
  • the app will serve the assets at https://localhost:5000/assets/courses/1/circle.png

The optimization of using md5 in the filename for better cache-control can be taken up as a separate issue.

@nikochiko https://github.com/nikochiko does this sounds like a plan?

— Reply to this email directly, view it on GitHub https://github.com/pipalacademy/riyaz/issues/26#issuecomment-1228043223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7MKYNQUG3QP4MTWMQAUCTV3BCSBANCNFSM57N7T2VA . You are receiving this because you were mentioned.Message ID: @.***>

nikochiko commented 1 year ago

Better!

I think we should add something lower level than file paths also, maybe streams or bytes, to handle for assets received over api. But we can take that up at a later time when we come to it.

On Fri, Aug 26, 2022, 1:19 PM Anand Chitipothu @.***> wrote:

How about:

asset = course.new_asset("circle.png")
asset.save_file(path)

On Fri, Aug 26, 2022, 11:41 AM Kaustubh Maske Patil < @.***> wrote:

Does this look like a good flow?

asset = db.Asset.new(collection=course, name="circle.png")disk.save_asset(on_disk_path, asset.path)

— Reply to this email directly, view it on GitHub <https://github.com/pipalacademy/riyaz/issues/26#issuecomment-1228094255 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAAB3EPZOVZ5CPS2MWBRAJ3V3BNYTANCNFSM57N7T2VA

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/pipalacademy/riyaz/issues/26#issuecomment-1228170868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7MKYMEYYWBLOZAM6Q2WQTV3BZJHANCNFSM57N7T2VA . You are receiving this because you were mentioned.Message ID: @.***>