metacademy / metacademy-application

Metacademy.org's application code
http://www.metacademy.org
GNU General Public License v3.0
692 stars 97 forks source link

How to handle unique resources, e.g. phd thesis #3

Closed cjrd closed 11 years ago

cjrd commented 11 years ago

How should we handle unique resources? Currently, we store general resources in the resources.txt file and reference them via the source tag in the node content. This prevents rewriting the same resource for each node, e.g. Bishop's PRML. But what if we want to cite a specific web page or publication. Should we add the web page to resources.txt and reference it via the source tag?

rgrosse commented 11 years ago

In the convolutional_nets node, I have an entry for the original LeCun paper, which lists the title, author, and so on. Right now, it gets ignored, because it lists

source: paper

and there's currently no entry for "paper" in the database.

Here's what I'm roughly envisioning: currently, we have a field in resources.txt called "resource_type." For each resource type, we would have associated templates which determine how it's rendered in HTML or plaintext. We would then have a dummy entry in resources.txt which is something like:

key: paper resource_type: paper

which just says to look for the template for "paper." The templates would be stored in the content repository. Any thoughts on this proposal?

cjrd commented 11 years ago

Templates sound like a good idea, but logically, these should probably be stored with the frontend content (since they'll be html templates that are independent of the content, itself).

One concern with this overall approach, though, is that we could have the same paper/thesis/resource repeated in a number of nodes, all with

source: paper link: paper.com

But then if paper.com changes to paper.edu we would have to update each node instead of just updating global resources.txt.

Also we would have to specify the "free" tag in the node/resources.txt file (not all papers are freely available, especially in non-CS fields).

Here's an alternative idea: Every unique resource has a unique entry in the global resources.txt file, meaning the LeCun paper would have:

key: lecunpaper title: Gradient-based learning applied to document recognition location: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf resource_type: paper free: 1

and the convolutional_nets/resources.txt would have entry

source: lecunpaper location: section II.A authors: Yann LeCun and Leon Bottou and Yoshua Bengio and Patrick Haffner mark: star

This way each nodes/resources entry is a pointer to the global resource that contains information that can be used by multiple nodes (title of resource, URL, free-tag, etc) without repeating this information for each node entry.

[FYI: I just described a simple text-based relational database]

This could be automated in a web form, where a new global resource is created if no previous resource matched the location tag, which should be a unique identifier for resources. We could also suggest that the resource already exists if the title matched a previously entered title.

Additional note: Manually entering the resources into the resources.txt file may work for the moment, but this will quickly become unmanageable, e.g. scanning through hundreds (thousands?) of resources to make sure I'm not repeating a certain book or thesis entry. Searching for the correct source key when I want to use a resource (did I call it lecunpaper or lecun_paper) or trying to find an old entry of mine in order to make updates. Sure, after entering a resource I'll probably remember where/how I entered it for a few weeks, but I'd like a good system for finding the same entry next year, or for finding an entry that someone else made.

Hmm, I can't seem to shake the feeling that a flat file database will be very difficult to maintain -- I'll open a new discussion for this, though.

rgrosse commented 11 years ago

I know it feels like templates should generally be part of the view, but I'd still argue for keeping them with the content for purposes of modularity. In particular:

  1. Let's say we have a regular contributor who works a lot with the content repository. They should be able to add a new resource type without understanding/modifying the code repository as well.
  2. If someone wants to create an entirely separate content database for another field (e.g. biology), it may have its own set of resource types, such as review papers. They should be able to create their own content repository while using the same server code. (In principle, we should be able to have one content repository which includes every field, but let's still not write off the possibility of separate content repositories.)
  3. Keeping the templates in the view introduces a logical dependency between the content and server repositories.

As far as whether to add individual papers to the global resources.txt, it's really a matter of how unique they are. My guess is that most papers will correspond only to a single concept in the graph, so it would be easier to keep all the information in the individual content nodes. But then, there are a number of review papers (e.g. the Wainwright & Jordan tutorial) which cover a lot of topics, so it would be worth giving them their own entry in the global resources.txt.

To make it easier to edit the text files, another option would be just to write tools that help with that. This would preserve the benefits of flat files, especially the ability to collaborate through Github. The tools could take the form of standalone programs for editing the text which provide autocomplete. Or it might take the form of emacs/vim plugins.

cjrd commented 11 years ago

Let's say we have a regular contributor who works a lot with the content repository. They should be able to add a new resource type without understanding/modifying the code repository as well.

True, currently the code depends on the content but not vice-versa, so a content developer can work without understanding the code but a coder has to understand the content.

But the templates will probably also incorporate CSS/javascript. Should we place template specific CSS/javascript with the content as well or make a list of valid CSS/javascript that can be used in a template? What about general CSS/javascript that's used throughout the display? I agree with your other points.

As far as whether to add individual papers to the global resources.txt, it's really a matter of how unique they are.

Yes, but I think we should also aim for consistency, e.g. the resource links and free-tag should always be in the global resources.txt file.

The tools could take the form of standalone programs for editing the text which provide autocomplete. Or it might take the form of emacs/vim plugins.

Any reason to develop standalone programs or vim/emacs plugins instead of incorporating these features into the browser?

rgrosse commented 11 years ago

On Sun, Apr 14, 2013 at 11:12 AM, Colorado Reed notifications@github.comwrote:

Let's say we have a regular contributor who works a lot with the content repository. They should be able to add a new resource type without understanding/modifying the code repository as well.

True, currently the code depends on the content but not vice-versa, so a content developer can work without understanding the code but a coder has to understand the content.

But the templates will probably also incorporate CSS/javascript. Should we place template specific CSS/javascript with the content as well or make a list of valid CSS/javascript that can be used in a template? What about general CSS/javascript that's used throughout the display? I agree with your other points.

Why would templates involve Javascript? I wouldn't expect them to involve anything more than simple HTML tags such as or . We'd need to define some way to interpolate the field values, but that shouldn't have to be too complicated.

As far as whether to add individual papers to the global resources.txt, it's really a matter of how unique they are.

Yes, but I think we should also aim for consistency, e.g. the resource links and free-tag should always be in the global resources.txt file.

Another way to handle this, which might be more consistent, would be to simply have the global resources.txt give a set of default attributes associated with each resource. Then, the "source: gpml" field in a node's resource entry would simply tell it to substitute in the corresponding attributes from the global entry. That way, the two attribute sets will be concatenated, and the view won't have to worry about whether the individual fields came from the global resources file or the node-specific one.

The tools could take the form of standalone programs for editing the text which provide autocomplete. Or it might take the form of emacs/vim plugins.

Any reason to develop standalone programs or vim/emacs plugins instead of incorporating these features into the browser?

To avoid reinventing the wheel. The text editors already have lots of features that people like, so there's no sense in forcing everyone to use a common web form interface.

— Reply to this email directly or view it on GitHubhttps://github.com/agfk/knowledge-maps/issues/3#issuecomment-16352592 .

cjrd commented 11 years ago

Why would templates involve Javascript? I wouldn't expect them to involve anything more than simple HTML tags such as or . We'd need to define some way to interpolate the field values, but that shouldn't have to be too complicated.

I guess that depends where/how you want to use the template. The current resources display for the knowledge map uses quite a bit of css and a little bit of javascript in the [additional info] link. But feel free to rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to simply have the global resources.txt give a set of default attributes associated with each resource. Then, the "source: gpml" field in a node's resource entry would simply tell it to substitute in the corresponding attributes from the global entry. That way, the two attribute sets will be concatenated, and the view won't have to worry about whether the individual fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of features that people like, so there's no sense in forcing everyone to use a common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via a browser is not a new concept, and there's lots of highly developed libraries, e.g. jquery, that provide a host of robust features that we could use to craft nice IDE. But yes, my favorite text editor is more comfortable than a browser, and it would be nice to improve it for editing kmaps. That being said, the bottleneck for agfk will be getting users to contribute content. Telling someone to clone our content git repository, create and edit six text files for each node, then send us a pull request with the content is a pretty big overhead (and adding in that they should use our emacs plugin doesn't help much in this regard). And quite frankly, I doubt more than a handful of individuals in CS fields would contribute. So I feel we should spend a lot of time developing the browser interface to the content. This way, users can simply click a button ("add node"), fill in the data have it immediately available. We can even incorporate realtime visualization of the graph they're creating. The point is that we want to entice users into contributing, not make them jump through hoops.

rgrosse commented 11 years ago

It's not a matter of browser vs. text editor, and it could be that libraries like jquery turn out to be the best way to construct a GUI for editing the text files. There's a whole ecosystem already built up around text files, including text editors, UNIX command line tools, Git, Github, etc. In order to replace text files, we'd have to reimplement a lot of functionality associated with each of these in order to make it as usable and intuitive.

Assuming we go with the two-tiered system, it's already possible for people to make relatively self-contained contributions (adding stuff for individual nodes) through the web interface. It shouldn't be too hard to set it up so you can add new nodes this way either. The only obstacle is for making complex changes like splitting nodes, and this is going to be tricky no matter how we handle it. I'm sure we can make a graphical interface that's easier to use than text files, but I doubt our version 1.0 will be.

Now, if people in other fields get excited about this and are willing to put in a lot of time to build up whole maps, I agree we should do everything we can to make things easy for them. This is certainly a problem we'd like to have. But I think we can worry about this when the time comes. Let's just make it clear that we're happy to talk to them to figure out what would be easiest for them. Then we'd be able to iterate with actual users. Hopefully the combination of their feedback and the experience of contributors under the current format will let us design an interface that's simpler and more intuitive than whatever we'd come up with now.

On Sun, Apr 14, 2013 at 12:33 PM, Colorado Reed notifications@github.comwrote:

Why would templates involve Javascript? I wouldn't expect them to involve anything more than simple HTML tags such as * or . We'd need to define

some way to interpolate the field values, but that shouldn't have to be too complicated. *

I guess that depends where/how you want to use the template. The current resources display for the knowledge map uses quite a bit of css and a little bit of javascript in the [additional info] link. But feel free to rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to simply have the global resources.txt give a set of default attributes associated with each resource. Then, the "source: gpml" field in a node's resource entry would simply tell it to substitute in the corresponding attributes from the global entry. That way, the two attribute sets will be concatenated, and the view won't have to worry about whether the individual fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of features that people like, so there's no sense in forcing everyone to use a common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via a browser is not a new concept, and there's lots of highly developed libraries, e.g. jquery, that provide a host of robust features that we could use to craft nice IDE. But yes, my favorite text editor is more comfortable than a browser, and it would be nice to improve it for editing kmaps. That being said, the bottleneck for agfk will be getting users to contribute content. Telling someone to clone our content git repository, create and edit six text files for each node, then send us a pull request with the content is a pretty big overhead (and adding in that they should use our emacs plugin doesn't help much in this regard). And quite frankly, I doubt more than a handful of individuals in CS fields would contribute. So I feel we should spend a lot of time developing the browser interface to the content. This way, users can simply click a button ("add node"), fill in the data have it immediately av ailable. We can even incorporate realtime visualization of the graph they're creating. The point is that we want to entice users into contributing, not make them jump through hoops.

— Reply to this email directly or view it on GitHubhttps://github.com/agfk/knowledge-maps/issues/3#issuecomment-16353905 .

cjrd commented 11 years ago

I agree that we shouldn't focus on allowing complex changes through the web interface at this time. My argument, is that given the option between developing a backend tool to accomplish a given task and developing frontend capabilities to accomplish the same task, we should focus on the latter if they require the same amount of effort. For instance, entering new resources into resources.txt is currently not set up that well. We have to manually check that a source is unique, both by name and content. We could either (i) write a python script or emacs extension that parses the resources.txt file and checks that the resource is unique or (ii) use a frontend form to submit resources that performs the same verification. Both of these tools require roughly the same amount of effort to build but option (ii) should be easier for the end user: simply fill our the required field and press "send" and this tool can be used without forking our repository, downloading an emacs extension, or running a python script. I believe that in the long run, content editing should take place mostly from the web interface simply because most users won't want to download/understand our entire project in order to contribute. So it makes sense to start developing these frontend tools now. This way we can debug the interfaces, begin seeing what type of functionality works well, and provide the eventual users with a well polished interface. I agree that we should work with outsiders to improve this interface (when the time comes, that is), but presenting a user with a good prototype and asking "how can we improve this" is better IMO than asking "what do you imagine to be a good interface".

On Apr 14, 2013, at 8:04 PM, Roger Grosse wrote:

It's not a matter of browser vs. text editor, and it could be that libraries like jquery turn out to be the best way to construct a GUI for editing the text files. There's a whole ecosystem already built up around text files, including text editors, UNIX command line tools, Git, Github, etc. In order to replace text files, we'd have to reimplement a lot of functionality associated with each of these in order to make it as usable and intuitive.

Assuming we go with the two-tiered system, it's already possible for people to make relatively self-contained contributions (adding stuff for individual nodes) through the web interface. It shouldn't be too hard to set it up so you can add new nodes this way either. The only obstacle is for making complex changes like splitting nodes, and this is going to be tricky no matter how we handle it. I'm sure we can make a graphical interface that's easier to use than text files, but I doubt our version 1.0 will be.

Now, if people in other fields get excited about this and are willing to put in a lot of time to build up whole maps, I agree we should do everything we can to make things easy for them. This is certainly a problem we'd like to have. But I think we can worry about this when the time comes. Let's just make it clear that we're happy to talk to them to figure out what would be easiest for them. Then we'd be able to iterate with actual users. Hopefully the combination of their feedback and the experience of contributors under the current format will let us design an interface that's simpler and more intuitive than whatever we'd come up with now.

On Sun, Apr 14, 2013 at 12:33 PM, Colorado Reed notifications@github.comwrote:

Why would templates involve Javascript? I wouldn't expect them to involve anything more than simple HTML tags such as * or . We'd need to define

some way to interpolate the field values, but that shouldn't have to be too complicated.

I guess that depends where/how you want to use the template. The current resources display for the knowledge map uses quite a bit of css and a little bit of javascript in the [additional info] link. But feel free to rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to simply have the global resources.txt give a set of default attributes associated with each resource. Then, the "source: gpml" field in a node's resource entry would simply tell it to substitute in the corresponding attributes from the global entry. That way, the two attribute sets will be concatenated, and the view won't have to worry about whether the individual fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of features that people like, so there's no sense in forcing everyone to use a common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via a browser is not a new concept, and there's lots of highly developed libraries, e.g. jquery, that provide a host of robust features that we could use to craft nice IDE. But yes, my favorite text editor is more comfortable than a browser, and it would be nice to improve it for editing kmaps. That being said, the bottleneck for agfk will be getting users to contribute content. Telling someone to clone our content git repository, create and edit six text files for each node, then send us a pull request with the content is a pretty big overhead (and adding in that they should use our emacs plugin doesn't help much in this regard). And quite frankly, I doubt more than a handful of individuals in CS fields would contribute. So I feel we should spend a lot of time developing the browser interface to the content. This way, users can simply click a button ("add node"), fill in the data have it immediately av ailable. We can even incorporate realtime visualization of the graph they're creating. The point is that we want to entice users into contributing, not make them jump through hoops.

— Reply to this email directly or view it on GitHubhttps://github.com/agfk/knowledge-maps/issues/3#issuecomment-16353905 .

— Reply to this email directly or view it on GitHub.

rgrosse commented 11 years ago

I think we both agree that we eventually want all the content editing to be done in a way that's more convenient than just editing text files. But it's going to be a while before the GUI is robust enough to replace the text version completely. We'll tackle a lot of the some questions anyway as we work on the user content editing forms, and at some point it'll become clear that it's time to get rid of the text and replace everything with that interface. Before that happens, I think the two-tiered system is a way to get something up and running quickly which has 90% of the desired functionality. Whenever there are bugs or missing features in the submission form -- and there will be -- we have the text DB to fall back to.

In terms of your example, editing resources.txt requires pressing ctl-S to search for the key and the name of the textbook. We'd still have to do something analogous through the online form, e.g. search for the resource name to see if it exists already. (And the search would have to be flexible enough to account for variations on the title. E.g., "Coursera: Neural Networks" would have to turn up "Coursera course on neural networks.") This could be slightly more convenient if done right, but probably not a huge time saver.

On Wed, Apr 17, 2013 at 8:33 AM, Colorado Reed notifications@github.comwrote:

I agree that we shouldn't focus on allowing complex changes through the web interface at this time. My argument, is that given the option between developing a backend tool to accomplish a given task and developing frontend capabilities to accomplish the same task, we should focus on the latter if they require the same amount of effort. For instance, entering new resources into resources.txt is currently not set up that well. We have to manually check that a source is unique, both by name and content. We could either (i) write a python script or emacs extension that parses the resources.txt file and checks that the resource is unique or (ii) use a frontend form to submit resources that performs the same verification. Both of these tools require roughly the same amount of effort to build but option (ii) should be easier for the end user: simply fill our the required field and press "send" and this tool can be used without forking our repository, downloading an emacs extension, or running a python script. I believe that in the long run, content editing should take place mostly from the web interface simply because most users won't want to download/understand our entire project in order to contribute. So it makes sense to start developing these frontend tools now. This way we can debug the interfaces, begin seeing what type of functionality works well, and provide the eventual users with a well polished interface. I agree that we should work with outsiders to improve this interface (when the time comes, that is), but presenting a user with a good prototype and asking "how can we improve this" is better IMO than asking "what do you imagine to be a good interface".

On Apr 14, 2013, at 8:04 PM, Roger Grosse wrote:

It's not a matter of browser vs. text editor, and it could be that libraries like jquery turn out to be the best way to construct a GUI for editing the text files. There's a whole ecosystem already built up around text files, including text editors, UNIX command line tools, Git, Github, etc. In order to replace text files, we'd have to reimplement a lot of functionality associated with each of these in order to make it as usable and intuitive.

Assuming we go with the two-tiered system, it's already possible for people to make relatively self-contained contributions (adding stuff for individual nodes) through the web interface. It shouldn't be too hard to set it up so you can add new nodes this way either. The only obstacle is for making complex changes like splitting nodes, and this is going to be tricky no matter how we handle it. I'm sure we can make a graphical interface that's easier to use than text files, but I doubt our version 1.0 will be.

Now, if people in other fields get excited about this and are willing to put in a lot of time to build up whole maps, I agree we should do everything we can to make things easy for them. This is certainly a problem we'd like to have. But I think we can worry about this when the time comes. Let's just make it clear that we're happy to talk to them to figure out what would be easiest for them. Then we'd be able to iterate with actual users. Hopefully the combination of their feedback and the experience of contributors under the current format will let us design an interface that's simpler and more intuitive than whatever we'd come up with now.

On Sun, Apr 14, 2013 at 12:33 PM, Colorado Reed < notifications@github.com>wrote:

Why would templates involve Javascript? I wouldn't expect them to involve anything more than simple HTML tags such as * or . We'd need to define

some way to interpolate the field values, but that shouldn't have to be too complicated. *

I guess that depends where/how you want to use the template. The current resources display for the knowledge map uses quite a bit of css and a little bit of javascript in the [additional info] link. But feel free to rewrite this so that it ionly uses basic html tags.

Another way to handle this, which might be more consistent, would be to simply have the global resources.txt give a set of default attributes associated with each resource. Then, the "source: gpml" field in a node's resource entry would simply tell it to substitute in the corresponding attributes from the global entry. That way, the two attribute sets will be concatenated, and the view won't have to worry about whether the individual fields came from the global resources file or the node-specific one.

Sounds good to me

To avoid reinventing the wheel. The text editors already have lots of features that people like, so there's no sense in forcing everyone to use a common web form interface.

I don't think using the browser is "reinventing the wheel." Data entry via a browser is not a new concept, and there's lots of highly developed libraries, e.g. jquery, that provide a host of robust features that we could use to craft nice IDE. But yes, my favorite text editor is more comfortable than a browser, and it would be nice to improve it for editing kmaps. That being said, the bottleneck for agfk will be getting users to contribute content. Telling someone to clone our content git repository, create and edit six text files for each node, then send us a pull request with the content is a pretty big overhead (and adding in that they should use our emacs plugin doesn't help much in this regard). And quite frankly, I doubt more than a handful of individuals in CS fields would contribute. So I feel we should spend a lot of time developing the browser interface to the content. This way, users can simply click a button ("add node"), fill in the data have it immediately av ailable. We can even incorporate realtime visualization of the graph they're creating. The point is that we want to entice users into contributing, not make them jump through hoops.

— Reply to this email directly or view it on GitHub< https://github.com/agfk/knowledge-maps/issues/3#issuecomment-16353905> .

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/agfk/knowledge-maps/issues/3#issuecomment-16502703 .

cjrd commented 11 years ago

Yes, you're right. I realize we're nowhere near orienting kmaps towards "mass appeal" and we probably shouldn't focus too much on mass usability at this time; it's premature. We can reevaluate this issue as agfk evolves. I would certainly like to eventually do some iterative "focus group" type studies to build the content editing frontend.