Import arbitrary mp4, mp3, and pdf content into a content hierarchy

dylanjbarth commented 10 years ago

As a first pass for the WCH Connect.Teaching project, the content import mechanism will be outside of the context of the app itself. Our partners at WCH and UNICEF will provide a repository of content, organized hierarchically by folders. E.g.

top level directory ---- sub-topic -------- mp4, mp3, pdf content ---- other sub-topic -------- sub-sub-topic ------------ mp4, mp3, or pdf content -------- other-sub-sub-topic ------------ mp4, mp3, or pdf content

and so forth.

A script will iterate over the levels of this content hierarchy and create a json file with the mapping of the content that the app will use to order the content.

Assigning @rtibbles for now, with lots of support from @dylanjbarth

Depends somewhat on https://github.com/learningequality/ka-lite/issues/2395

dylanjbarth commented 10 years ago

First major design decision: how will our content partners add meta data to the content types? In a separate spreadsheet? In a text file included in the content repository? I'm most in favor of creating a special view on the central server that allows them to view imported content and add metadata... at first blush it may seem like a lot of work, but I think it would save a lot of headache on both sides of the ball. Thoughts? @rtibbles @jamalex

aronasorman commented 10 years ago

Can lead this as well, once I finish on the PDF loading stuff.

mjptak commented 10 years ago

vimeo videos? "arbitrary" youtube videos? other sources? Its gonna be like herding cats but very very powerful and empowering to many voices in many tongues.

jamalex commented 10 years ago

@dylanjbarth it might be good to see what format they have their metadata in already, since manually re-entering through a web interface might be more painful than just converting. And without some metadata to start with (at least a title), it would be hard to navigate the content in the web interface to add metadata.

One other thing that's not automatically captured by the folder structure is the order of children within a topic. We might need to have folder/file names prefixed with 00, 01, etc.

@aronasorman I believe @rtibbles has offered to lead this, and has already started coding parts of it up.

rtibbles commented 10 years ago

It is true that I have offered to lead it. Not quite so true that I have started coding it up, but it is part and parcel with the updating of khanload that I have already been doing.

noemigerber commented 10 years ago

Hi all! My name is Noemi, and I'm based at War Child in Amsterdam, writing the teacher professional development (TPD) modules and putting together the content for the tablets. I'm excited to be in touch with you, and look forward to working with you on Connect.Teaching! I have a number of questions that I've been running into regularly during my work, for your team:

• How will the resources on the tablet be organised? Will the same organisational system as the current one be used? (e.g. Teaching materials, by grade and subject?) Or will this / can this be changed? • How will the files be sorted/listed (within each category)? Alphabetically? (For example, how are the files in the “all” categories listed?) • Will there be a search function? (This would be extremely useful!) • How do we add metadata to resources? • In Phase 1, a lot of documents were uploaded to the tablet as Word documents. I would lean towards uploading mostly PDFs (to minimise the chance of edits being made to the documents). Or is there a pertinent reason why documents should be uploaded as Word docs instead? •We are currently writing teacher professional development (TPD) modules that will also go on the tablet. These TPD modules will refer to a number of resources on the tablet. Would it be possible to add hyperlinks to these references in the modules, that will redirect people directly to the named resource on the tablet? Or is our best solution for this to add the file pathway for each resource in brackets after it is referenced?

Please let me know if any of the questions are unclear, or if I can be of assistance in any way. I'd also like to attach the beginnings of our Content Register (in Excel), but I'll send it to Jamie as an email attachment as I can't seem to attach documents here.

rtibbles commented 10 years ago

Hi Noemi,

I am answering your questions inline below - looking forward to getting something in place that works for both of us!

How will the resources on the tablet be organised? Will the same organisational system as the current one be used? (e.g. Teaching materials, by grade and subject?) Or will this / can this be changed?

As suggested by Dylan's description above, the organization of the content, and the hierarchy of content will be determined by the directory structure of the content folder that you provide us with. This will mean that, if necessary, you can have multiple levels of subdivision of content (Subject > Grade > Subtopic) or whatever is most helpful to you.

How will the files be sorted/listed (within each category)? Alphabetically? (For example, how are the files in the “all” categories listed?)

My suggestion here would be that the file names include a number as the first part of the file name (e.g. "1 - Teaching Basic Addition.pdf") which would determine the order. Folders would also have numbers in their names to determine their order.

Will there be a search function? (This would be extremely useful!)

Yes! KA Lite currently has a search function for its content, and we will make the same feature available for this project.

How do we add metadata to resources?

A great question, and one that we have been discussing internally as well. There are a couple of alternatives, depending on how rich you want the metadata to be.

The first, and least complex, would be to add any relevant metadata to the filename in a particular order. This would, naturally, limit the amount of metadata that you could include.

A second alternative is to have a separate .txt or .json file for each file or folder, and include any relevant metadata about it in these files. This would allow for maximum flexibility, least opportunity for data getting messed up on import, but would require creating a separate file for each folder and file to describe the metadata.

The final alternative is to have a master spreadsheet or json file that describes each of the resources (files and folders). While this would be easier to create, as only one file would need to be created, one mistake in any of the file descriptions could potentially cause data issues for every single file (as one poorly described resource could potentially throw off the import for every other resource as well).

I would prefer alternative 2, as it is the most robust, but also realize this may place the heaviest workload on you and your team.

In Phase 1, a lot of documents were uploaded to the tablet as Word documents. I would lean towards uploading mostly PDFs (to minimise the chance of edits being made to the documents). Or is there a pertinent reason why documents should be uploaded as Word docs instead?

I can't see any good reason to use Word documents.

We are currently writing teacher professional development (TPD) modules that will also go on the tablet. These TPD modules will refer to a number of resources on the tablet. Would it be possible to add hyperlinks to these references in the modules, that will redirect people directly to the named resource on the tablet? Or is our best solution for this to add the file pathway for each resource in brackets after it is referenced?

What format will these TPD modules be in, and how will they be accessed within the tablet? Are they intended to be internal or external to the software that we are creating? Is their arrangement of content orthogonal to the arrangement of content that you will be giving us for the topic hierarchy? If the modules themselves could be part of the topic hierarchy (with the modules themselves being metadata for a certain level of the topic hierarchy) this might be manageable.

noemigerber commented 10 years ago

Hi Richard,

Many thanks for your quick responses!

-Great news about the search function! :) -Also nice news about the flexibility of the content's organisation and hierarchy - I didn't expect so much flexibility, so I'll have to work out asap what would be most appropriate for this Phase of the project. -Thanks for explaining the metadata options. We also agree that Option 2 seems the best of the three. I must admit though that my technical understanding is limited, so I did want to ask: It's not possible to add the metadata to the 'properties' function in Word and PDF? (though I suppose doing this would be about the same amount of work as creating a separate .txt or .json file for each file or folder) -I don't have a team - it's just me! So I'll be the one preparing all the metadata. -By TPD module, I meant a unit or chapter. (It's mostly text, interspersed with activities - not a lot of images.) There will be 5 in total, and currently they are in Word form: they're individual/independent documents that simply refer to other resources that will be on the tablet. We weren't sure what the format options were for putting them on the tablet, so we initially thought of Word or PDF format. I'm not sure yet where in the content hierarchy they would go (I suppose a separate folder would make most sense). I could send you one of the completed modules, though I'm not sure how to add files here? Instead, I've added a shot of (part of) a page of Module 2. The bold, italicised, red-coloured text (in this case, "Shortlisted Micro-Innovations Uganda 2014" in the box) indicates the title of a resource on the tablet. This is an example of how resources are referred to in our modules/units/chapters. We plan to add file pathways in brackets, as teachers might print the modules and use them as booklets, and for teachers to know how to find the resources in case we can't hyperlink these titles. connectteaching_mod2_samplepage

rtibbles commented 10 years ago

Putting the data in a text file would make it easier to read - reading any kind of data programmatically from a pdf can be a pain, so if it is the same amount of effort for you to create a text file, that would be preferable. That also means that we have a standard format for adding metadata, regardless of the content - so the same format works for pdfs and mp3 files.

Looking at the format of the modules, we could embed them as pdfs at some point in the hierarchy (there could be a separate top-level 'modules' Topic), the only tricky thing would be embedding the paths into them for the individual resources. What we could do is add 'related content' links at the bottom of the display of the document so that users could click through from there.

If they have printed out the document, then the search function will actually be the best option here. Rather than demanding that the teachers type in a very long file path - they can simply type in the name of the resource into the search bar to find it more quickly.

dylanjbarth commented 10 years ago

@noemigerber great to connect and thanks for the information! I have a couple of quick comments, questions and suggestions:

I've created a Google Drive folder in our shared project folder called Shared with FLE from WCH Team where you can add docs and link to those docs from here. Our team has access to that folder as well and it will be easier to work with than sharing via email.
Excellent that option #2 will work for you -- I've uploaded the content register you shared to our Google Drive and the next step will be determining exactly what metadata is important for the application to have access to -- we will discuss this tomorrow in our daily meeting as @rtibbles has already gotten a head start on that thinking process with his first pass implementation of the audio player and we will post updates and questions here!
We will also discuss hyperlinking between resources. I agree with @rtibbles that with printed resources as guides, the search functionality would be easiest, but if printed resources are not possible (or part of the program), the "related content" strategy that Richard suggests is probably our best bet. Let us know what will be possible so we can constrain our planning.

mjptak commented 10 years ago

If you all go the route of keeping track of the order of an item in the directory by number I suggest beginning with 01, 02, etc in case the number of items exceeds 9 when the whole alphanumeric order goes awry. Unfortunately, any ordering like this makes later insertions of content into the order difficult. I'm excited to see how this all progresses with Naomi's currated content.

On Tue, Sep 23, 2014 at 6:29 PM, Dylan Barth notifications@github.com wrote:

@noemigerber https://github.com/noemigerber great to connect and thanks for the information! I have a couple of quick comments, questions and suggestions:

I've created a Google Drive folder in our shared project folder https://drive.google.com/a/learningequality.org/#folders/0BzvQR0Om2Pbac1F2NmdYQTgzX1k called Shared with FLE from WCH Team https://drive.google.com/a/learningequality.org/#folders/0BzvQR0Om2PbaY0tlNHBobFBIQjQ where you can add docs and link to those docs from here. Our team has access to that folder as well and it will be easier to work with than sharing via email.

Excellent that option #2 https://github.com/learningequality/ka-lite/issues/2 will work for you -- I've uploaded the content register you shared https://docs.google.com/a/learningequality.org/spreadsheets/d/17Hbr0IT8BMwRLj1T-3jxFnyhKq3O9_C-8YB-OjdJsLA/edit#gid=232648078 to our Google Drive and the next step will be determining exactly what metadata is important for the application to have access to -- we will discuss this tomorrow in our daily meeting as @rtibbles https://github.com/rtibbles has already gotten a head start on that thinking process with his first pass implementation of the audio player https://github.com/learningequality/ka-lite/pull/2399 and we will post updates and questions here!

We will also discuss hyperlinking between resources. I agree with @rtibbles https://github.com/rtibbles that with printed resources as guides, the search functionality https://github.com/learningequality/ka-lite/issues/2402 would be easiest, but if printed resources are not possible (or part of the program), the "related content" strategy that Richard suggests is probably our best bet. Let us know what will be possible so we can constrain our planning.

— Reply to this email directly or view it on GitHub https://github.com/learningequality/ka-lite/issues/2397#issuecomment-56605756 .

noemigerber commented 10 years ago

Hi all, Thanks for your quick responses and action!

@dylanjbarth, thanks for creating the Google Drive folder - I've added the draft of Module 2, Topic 1 (as a Word doc, and as a Google Doc) to the Shared with FLE from WCH Team folder.
@mjptak, good point that later insertions of content into the order would be difficult if all content is numbered. Would it be possible to simply have the content ordered alphabetically? (can this be done automatically?) For most of the content, I believe it's not really necessary to have it ordered in a specific way, so I'd suggest staying away from the numbering, since it likely won't add much value and may just create complications. The only things I can think of that should be numbered would be the modules/handbooks for teachers that we're writing (which there will only be 5 of, so no major re-numbering would ever be needed there).
Yes, option #2 for metadata will be lots of work, but will have to be possible ;) and it sounds like it will definitely be the best option! Looking forward to hearing updates on your discussions about what metadata will be important for the application to have access to!
Regarding hyperlinking of resources - good to hear from @rtibbles and @dylanjbarth that this will be too complicated. Printing the modules/handbooks that we're writing is not part of the project - the idea is to have them on the tablets. Jonathan (War Child) is quite certain though that people will go ahead and print them. So, @rtibbles' "related content" strategy suggestion does sound nice, but we think you can save yourself that work. Instead, we'd suggest for me to add the file pathways to the manuals/handbooks, and for the search function to be a back-up way. While the search function is very useful, we still want to add the file pathways in this case, for two reasons: 1) Even for well-educated South Sudanese people we're working with, spelling things correctly in English is sometimes a challenge. So, there is a big risk that the teachers who will be using the tablets (some of whose English is very poor) will run into problems using the search function since they may type key words / titles incorrectly. If they have the file pathway, at least they have another way of finding the resource that doesn't involve spelling correctly. (Since I'm assuming they can simply click through the various folders/hierarchy levels that there will be, to arrive at the correct level and find the resource?) 2) Adding the file pathway hopefully encourages teachers to understand the organisation of the content on the tablets, and therefore hopefully also encourages them to look for other similar resources on the tablet. If they only use the search function, they won't become very familiar with the content organisation. So, I'll have to make sure that, whatever organisation/hierarchy of content we choose, it's as simple as possible and keeps file pathways short (they are currently relatively short - no more than 4 layers, I believe). (And I'll have to add instructions/explanations on how to follow a file pathway, and how to use the search function!)

rtibbles commented 10 years ago

Content metadata that we will need to display it in the topic hierarchy: Title Description

Ones that could be included: Language Author License Related content (filenames of other content that can be linked to from the content display page)

Additional metadata, such as an id, path, etc. will be generated automatically from the directory structure and file name.

If there is any other metadata, such as keywords or the like that you would like included, let us know.

Note: as well as having metadata about files, it will be helpful to have metadata about the folders themselves (as these will correspond to 'topics' in the content hierarchy - a title and description will definitely suffice for these).

Thanks for the delineation of the path - I was thinking of the URL path that would be in the browser, but I see now you mean the path that they will have to navigate. You can generate that from the names of the folders that the content is embedded in, and they can just follow the titles of those topics.

noemigerber commented 10 years ago

Thanks @rtibbles for the update! Apologies that I seem to use terms incorrectly/with a different meaning in mind! I'll try to clarify what I mean better in future. :)

In addition to title and description, we would like to include the following metadata:

Language
Author/organisation
Subject
Primary level
Key words

I have a few more questions for clarification about the metadata:

By 'description', do you mean "video", "audio" etc, or something more substantial (e.g. the descriptive phrases I've included under the heading 'description' in our Connect.Teaching content register so far)?
What do you mean by license?
'Related content' might be useful to include as well. Could you explain to me though how this would work/what it would look like?

rtibbles commented 10 years ago

Hi @noemigerber those all sound fine - we've deliberately tried to make our handling of the metadata a bit more flexible to be able to accommodate this kind of thing.

By 'description' I mean something very much like what you have in your content register so far, so I think we're on the same page. The 'kind' ('audio', 'video', 'pdf', etc.) will be inferred by the software based on the file (mp3 = "audio", etc.).
License is the terms under which the content is licensed - standard copyright would be All rights reserved, but there are also other kinds of license that allow for more liberal sharing in offline contexts outside the scope of the 'authorized' users of the project. More info here http://creativecommons.org/licenses/#licenses and here https://creativecommons.org/choose/ (our work more broadly has been made possible by the licensing of content in these more open ways that allow for free sharing offline).
When viewing a piece of content in the software, 'related content' would be linked to directly from that page, giving another way for users to find relevant and related content. The related content metadata therefore, would just be a list of filenames of the related content, and then the software would turn these into links that users could click on and go directly to.

jamalex commented 10 years ago

The related content metadata therefore, would just be a list of filenames of the related content

Might be good to have these be the "ids" (do we have those? or I guess we're just using the hash? In which case filename, as long as they're unique, might be easiest here.)

jayarakesh commented 10 years ago

I dont know anything about python ,java or any other languages please help me in updating the topic tree.when i click on khanload.py it just pops up and vanishes.

rtibbles commented 10 years ago

@jamalex Yeah - we would substitute these in generating our topic tree, based on the generated ids we create.

@jayarakesh khanload is a django management command. Currently, we do not support it for end user use, and it is provided as is. We will be doing an update of the topic tree in version 0.13 which should be coming out at some point in November, please look out for the software update then!

noemigerber commented 10 years ago

Hi all, apologies for my long silence. Here’s an update and some questions related to the content.

I’ve worked out a content hierarchy (see the “tablet content hierarchy” Google doc). There is a slight chance that this may change a bit, if over the next 1-2 weeks I realise that there is a more logical way of organising something, but this should be a fairly final hierarchy. In any case, if I understand correctly, I will be delivering the content in appropriate folders and sub-folders to you, all in one go?

Next, I have a question related to the content’s metadata. I’m reviewing and preparing content for the tablets, and would like to prepare the metadata for each file as I go along. We discussed creating a separate .txt or .json file for each resource (and folder). Which would you prefer, and what suggestions do you have for me on how exactly I should do this? As we discussed, the metadata will definitely include: Title Description (is there a max. length/number of characters for this?) Language Organisation (as it is often unclear who the exact author is, I'd suggest using 'organisation' instead) Subject Primary level Key words

A few more questions about this:

Teachers could search for resources according to any of the above metadata, correct?
@rtibbles, thanks for your explanation of licenses! I’ve added a column for licenses in the content register. Thinking shorter-term for just Connect.Teaching Phase 2, I think it would not be imperative to include metadata on licenses. However, thinking of longer-term uses of the platform and content, would you find it advisable to include metadata on the license as well? If you do, the I could add it as well.
Thanks for also explaining the related resources. Practically speaking, this would mean that I would have to add the file names of all related resources in the metadata file?
Should the title of the resource (as included in the metadata and displayed on the tablet) and the file name be identical, or is it alright if they are different? (And more generally: Is there a standard way of naming the files that would be most useful, and that I should be paying attention to?)

Please don’t hesitate to contact me anytime if you need my input!

rtibbles commented 10 years ago

Thanks for the feedback here, @noemigerber.

If you were creating a metadata file for the file "Using Manipulatives to Teach Addition.mp3" you would name it "Using Manipulatives to Teach Addition.mp3.json" and it would have something like the following content:

{
"title": "Using Manipulatives to Teach Addition",
"description": "How to use physical objects to teach children the basic concept of addition",
"language": "en",
"author": "John Wittard",
"organization": "Manipulative Mathematics",
"subject": "Mathematics",
"primary_level": 1,
"keywords": ["mathematics","addition", "hands-on"],
"related-resources": ["Teaching Addition.pdf", "Addition Activities.pdf"]
}

Title is the main determinant of search at the moment, @aronasorman can you comment on how search currently works and how easy it would be to search through all metadata?
License information can be added as wanted, one advantage of adding it now is that if we want to use this content more widely, then it would require less additional editing to make it available for broader release.
An example of related content is shown above. You can add the file names in this way, assuming that you maintain unique filenames across the content.
It is alright if the file name and title are different - the title will be the one displayed to the end user, the file name will be used to generate the 'slug' - which will form part of the URL for the resource.
Files can be named in pretty much any way you want - as per previous discussions, the files will be imported in alphabetical order - so in a particular folder, the files will be listed within their topic alphabetically.

dylanjbarth commented 10 years ago

One thing that might be helpful for you @noemigerber is using a free online tool like http://www.jsoneditoronline.org/. It allows you to type out the JSON on the left hand panel, it appears to correct any formatting errors for you and allows you to interact with it in the right hand panel.

aronasorman commented 10 years ago

re extending search to include all metadata:

Here are the relevant snippet in search_autocomplete.js:

function flattenNodes() {
    // now take that structured object, and reduce.
    var flattened_nodes = {};
    for (node_type in _nodes) {
        $.extend(flattened_nodes, _nodes[node_type]);
    }
    _nodes = flattened_nodes;
    for (title in _nodes) {
        if($.inArray(title, _titles) == -1){
            _titles.push(title);
        }
    }
}

var titles_filtered = $.ui.autocomplete.filter(_titles, request.term);

            // sort the titles again, since ordering was lost when we did autocomplete.filter
            var node_type_ordering = ["video", "exercise", "topic"]; // custom ordering, with the last in the array appearing first
            titles_filtered.sort(function(title1, title2) {
                var node1 = _nodes[title1];
                var node2 = _nodes[title2];
                // we use the ordering of types found in node_types
                var compvalue = node_type_ordering.indexOf(node2.type) - node_type_ordering.indexOf(node1.type);
                return compvalue;
            });

Right now we request the categorized topic tree (video, exercise, topic) so we should extend that endpoint to make it work with the new content loading scheme. For the search frontend as a first pass we can just add all relevant metadata to a node's entry in the _titles variable, and then extend node_type_ordering to take audio and pdf into account. So as long as the topic tree json has the same structure this should still work.

aronasorman commented 10 years ago

The PR adding this feature has been merged. So I'm closing this now. Doesn't mean the excellent discussion on this thread has to stop though.

noemigerber commented 10 years ago

Thanks all! @dylanjbarth the tool is very useful, thanks! Thanks also for the example and answers, @rtibbles. Here's my first attempt. ;) (though I can't figure out how include it with the grey background)

{ "title": "Planning and preparing your lessons", "description": "Why lesson planning is important, what a lesson objective is, and parts of a lesson", "language": "en", "date": "None", "organization": "TESSA", "subject": "Planning", "primary_level": "None", "keywords": ["planning","lesson plan"], "licence": "CC BY-SA 3.0", "related-resources": "WCHLessonPlanParts.pdf" }

I've taken out author, as we don't know the authors to most of the resources we have (or would it be preferable for me to keep it in and leaving it blank/filling it in as 'unknown'). I've included licence, in case the content will be used more widely in future, and date, for the same reason. Please let me know if you have any suggestions for improvements to the above (e.g. is licence format ok?) - this is all very new to me, and I'm learning as I go! (please also excuse any ignorant questions I pose for the same reason) A few questions related to the above:

Thanks for the example of the related resources. I assume the title of the related resource that is displayed will be according to the title in its metadata, and not according to file name?
I've been using the category 'subject' as broader than only subjects taught to students (e.g. see above - subject is 'planning'). Is this alright, or will this pose any problem?
Is there a max. length/number of characters for the description? I'm trying to keep them as concise as possible, but wanted to check.
Thanks @aronasorman for the explanation. Do I understand correctly that it will be possible to search according to any of the metadata? I would expect the categories used most for searches to be: title and key words, and perhaps also subject, primary level or organisation. At some point in future, I'll need to write short instructions for teachers on how to use the search function (and more generally, on how to navigate and use the app/platform), so there I can explain to them what's possible and what isn't.

Also, if I should be asking these questions etc. in another thread, do let me know!

noemigerber commented 10 years ago

One more question, regarding the file name of the JSON file: Should this be identical to the PDF (or other) file name? E.g.: PDF file name: PlanningAndPreparingYourLessons.pdf JSON file name: PlanningAndPreparingYourLessons.pdf.json Or can the 'pdf' in the JSON file name be left out? (i.e.: JSON file name: PlanningAndPreparingYourLessons.json).

rtibbles commented 10 years ago

@noemigerber this looks great!

Most of the fields are optional, so if you have nothing to go in them, you can just leave them out entirely. If title is missing it will get filled in from the filename.

License format looks great, and in future we should probably include some information in the software about different CC licenses.

I had been planning to use the filename, as you had indicated here - so it is fine to continue with that.
We don't actually currently use 'subject' anywhere, as the topic navigation is defined by the file hierarchy - I assume we will integrate it into the search information, but it will probably be more for tracking the content than anything, so shouldn't be a problem.
The descriptions can be arbitrarily long - but concision probably helps the end user to read them!
I think the plan is to allow the search function to search across as much of the metadata as possible, but we will also need to do some testing to see how that works out and figure out what is best for the enduser.
You have the file name right for the JSON PlanningAndPreparingYourLessons.pdf.json - just adding it to the end of the overall filename helps to make sure it is completely disambiguated.

And yes, this is still the best place! @aronasorman just 'closed' the issue because we have the first pass implementation ready.

noemigerber commented 10 years ago

Great, thanks for your feedback @rtibbles! Then I'm all set to create lots of JSON files. :)

noemigerber commented 10 years ago

Hi @rtibbles (and all!), I have a slightly different question: I've come across some videos on the Teaching Channel (e.g: Video on peer teaching), and it seems the material is free to be used (License CC BY-NC-ND 3.0). However, practically speaking, what would be the best way to get these video files? There doesn't seem to be an option to download them. Does this mean I'd have to write to the Teaching Channel to obtain the files, or do you have any suggestions for how to get the files more easily?

jamalex commented 10 years ago

They're using a flash player on that page, so downloading would be tricky (and not in the right format, i.e. MP4). Looks like the same video is available on YouTube (https://www.youtube.com/watch?v=k3GRXWfudeI), and there are tools for downloading videos from YouTube (e.g. http://www.clipconverter.cc/). But make sure that particular video is indeed CC-licensed, as it's marked on the YT video as Standard YouTube License (but that's probably an oversight, if they've indicated elsewhere that it's CC). Make sure they're listed as the authors in the metadata, too, to comply with the BY clause.

mjptak commented 10 years ago

mozilla has a great plugin for downloading from youtube or vimeo or ... On Oct 10, 2014 10:39 AM, "Jamie Alexandre" notifications@github.com wrote:

They're using a flash player on that page, so downloading would be tricky (and not in the right format, i.e. MP4). Looks like the same video is available on YouTube (https://www.youtube.com/watch?v=k3GRXWfudeI), and there are tools for downloading videos from YouTube (e.g. http://www.clipconverter.cc/). But make sure that particular video is indeed CC-licensed, as it's marked on the YT video as Standard YouTube License (but that's probably an oversight, if they've indicated elsewhere that it's CC). Make sure they're listed as the authors in the metadata, too, to comply with the BY clause.

— Reply to this email directly or view it on GitHub https://github.com/learningequality/ka-lite/issues/2397#issuecomment-58673306 .

noemigerber commented 10 years ago

Thanks @jamalex and @mjptak for the tips about the video! (I didn't realise the videos were also on YouTube!) I've had a look at more Teaching Channel videos on Youtube, but they seem to all be labelled with 'Standard YouTube License'. That's a bit confusing, since on the Teaching Channel it's quite clearly stated that this video's license is CC BY-NC-ND 3.0. I'll therefore go ahead and assume that it was an oversight on YouTube (unless you think that's too risky). Thanks also for the tip, I'll make sure to add the authors in the metadata! I have a question about the Clip Converter you sent, @jamalex: I'm not sure which video quality option to choose: YouTube Video High Definition (1080p), size: 26 MB YouTube Video High Definition (720p), size: 17 MB YouTube Video High Quality (480p), size: 7 MB YouTube Video Standard Quality (360p), size: 5 MB YouTube Video Mobile Version (3GP), size: 3 MB (this is using the example of the 1:27min long video on Peer Teaching that we discussed above). Would the Standard Quality suffice (to save some space), or would another quality be preferable?

noemigerber commented 10 years ago

Hi @rtibbles, I have a quick update on licenses! I've been able to work out what the licenses for War Child-produced documents are, and I've been adding info on licenses where I know them (mostly for CC licenses). However, there are quite a number of resources for which I do not have license information, so I won't be able to include this info in the metadata. This isn't ideal (thinking of the longer-term use of the app and possibly the content), but I'm afraid it's the most realistic solution for now. Regarding the 'related resources' metadata, I will try to add this where I can, but will likely not do so for every resource - I'll do it mostly in the instances where the link between a few resources is quite strong. I hope this is alright - or will this affect your plans negatively somehow?

noemigerber commented 10 years ago

I have a slightly different question about the content hierarchy / topic tree (which I'm not sure where I should best ask it): Will it be possible to have an 'all' option as well, at certain levels of the topic tree? For example, that teachers can choose 'all' primary levels, and choose a specific subject (e.g. science)? This is an option in the current version of the software. I suppose it is not absolutely imperative to have (as I am not sure how useful it is), but I thought I'd ask about the possibility, and if you have any experience with the usefulness of such 'all' options/folders.

rtibbles commented 10 years ago

Hi @noemigerber,

Feel free to add related content only where it is relevant - that will make it a more useful and salient feature for end users when it does appear.

I guess my main concern with the user experience of an 'all' folder would be that there would be too much content/too many options displayed to the end user at once. That said, the same content can be repeated in the topic tree if you do need to put it in multiple places.

I think as part of the navigation discussion going on, it is definitely worth thinking about how the sidebar navigation and search (and possibly other features) can contribute to this more discovery oriented mode.

jamalex commented 10 years ago

@rtibbles, what would be the best way for @noemigerber to put the same content in multiple places in the topic tree? Copying the files themselves into the other folders should work, I guess -- it makes the source folders bigger, but since we de-duplicate upon import, it won't increase the space requirements for the content when distributed.

rtibbles commented 10 years ago

Yes, just copying the files would be best. I will double check to make sure that the metadata parsing is robust in case the metadata is not copied as well.

jamalex commented 10 years ago

Re: Learning Channel, that's great: if they explicitly list CC license on their site, we should be fine, but would be good to shoot them a quick note asking them to update the licenses on YouTube (and to confirm). 720p is probably a good resolution.

noemigerber commented 10 years ago

Thanks @rtibbles, it's good to know that I can limit the use of the 'related content' to only the instances where it's very relevant. Thanks also for your thoughts about the 'all' folder. I was thinking along the same lines, but was wondering if perhaps there is some added value to it after all. But if we can't think of any within our sidebar navigation discussion, we can certainly leave it out. It's definitely good to know though that I could repeat content (and that this wouldn't increase size requirements) - though so far I haven't come across (m)any instances where this would be needed. If I do do it, I'l make sure I copy the metadata as well. Thanks @jamalex about the tip about the videos, I've sent them a message to double-check about the lincenses on YouTube. Unfortunately though, they have only very few of their videos on YouTube, so there are a few that I would have liked to include that I won't be able to. They also have a note on their 'contact us' page specifically mentioning that: "Tch is not currently able to provide downloads of individual videos. We apologize for the inconvenience," so unfortunately it's not even worth asking for video files. Oh well! Good to keep an eye on their site for the future! ;)

noemigerber commented 10 years ago

Hi @rtibbles, I have one more question about metadata: Would it be useful to include the length/duration of a video or audio file? And if length/duration would be included in the metadata, could this also be displayed when teachers are searching through the content on the tablet? I suppose this links to some questions/comments I'm about to share on discussion #2486 about the sidebar and content display.

rtibbles commented 10 years ago

The weird thing about the Teaching Channel is that their videos are openly licensed, but with a non-commercial clause, so they are worried about people using the videos and charging for them - which is why they turned off the (previously enabled) download feature.

The length/duration can be included in the metadata. I will also look into how easy it would be to read that information directly from the video and audio files (as they will have that data internally).

noemigerber commented 10 years ago

Aha, that makes sense! It's a shame though, that download button would have been perfect...

Thanks for looking into how easy it would be to read the info directly from the files, @rtibbles! That would mean I wouldn't have to include it in the metadata, correct? That would save me a little bit of work. ;)

rtibbles commented 10 years ago

Correct. I'll look into it and report back.

noemigerber commented 10 years ago

Hi all! I've updated the content hierarchy, and have uploaded the new (and final) version to Google Drive. (Note: the title of the doc is nearly the same, except for the date - which is hard to see as it's 17 instead of 07) What has changed is: Instead of grouping primary levels (i.e. P1-P5 and P6-P8), we have reverted to the 'original', which is having all primary levels separate (P1, P2, ...). In addition, the subfolder 'national languages' is now called 'mother tongue' to be in line with the terms used in the syllabus. Let me know if you have any questions about it!

rtibbles commented 10 years ago

@noemigerber Seems like reading the duration from the files themselves is definitely possible. Will implement today and let you know the results.

rtibbles commented 10 years ago

@noemigerber I now have the content import reading the duration from the files, so no need to add this in the metadata.

noemigerber commented 10 years ago

Thanks for checking that out, @rtibbles! I'm happy to hear that it's possible, and I don't need to add it!

noemigerber commented 10 years ago

Hi @rtibbles and @jamalex, I'm very glad we caught the issue with the 'related content' metadata on the call today! (didn't notice it earlier today while going through the demo). I'll have to go back and edit the JSON files I've created so far. Just to make sure that I get it right: It should be as follows, correct?

"related_content": "nameofrelatedcontent.pdf"

Also, once I've corrected the JSON files, can I simply replace the incorrect JSON files on Google Drive, or shall I specify which JSON files have been changed?

rtibbles commented 10 years ago

Yep, "related_content" is the correct key.

I think we can just recopy the whole structure over, so just let us know when it is updated and we can reimport.

noemigerber commented 10 years ago

Perfect, thanks, will do! I won't be able to do so today anymore, but will do it tomorrow!

noemigerber commented 10 years ago

Hi @jamalex, sorry I didn't know where else to put this info. I've had a look at the tablet we have, and a PDF can be opened with one of the following programmes:

Adobe reader
KingReader HD
OfficeSuite I tend to open things with Adobe reader, and they open reasonably fast (though they're only 2-page docs). Does this answer your question?

learningequality / ka-lite

Import arbitrary mp4, mp3, and pdf content into a content hierarchy #2397