wikiblocks / wikiblocks-chrome

DEPRECATED. A Chrome Extension that helps you find blocks that are relevant to a Wikipedia article.
Other
10 stars 0 forks source link

bl.ocks.org content script #3

Open bmershon opened 8 years ago

bmershon commented 8 years ago

The extension can load a content script when a user navigates to an example on bl.ocks.org:

http://bl.ocks.org/bmershon/de2b4be6063c8c6cb525

The following pieces of information seem relevant:

Here's a straw man convention for a meta tag placed in the Head of index.html:

<meta name="robots" content="noindex,nofollow">
<meta name="wikiblocks" content="index,nocache">
<meta name="tags" content="rips-complex,height function,betti numbers">
<meta name="categories" content="topology">

It's important that any new convention for blocks does not require a change in the bl.ocks.org viewer itself. The use of meta tags is an example of an easily added piece of information which could greatly help the make blocks that illustrate a particular concept able to be easily found.

See #1. Some of the information we need to find on a bl.ocks page requires the rendering of markdown and possibly syntax highlighted code. Mike Bostock uses a Mutation Observer in his bl.ocks.org Chrome extension; that might be a way to ensure the DOM has stabilized before we attempt to find links and other rendered content.

bmershon commented 8 years ago

I find myself wanting to know whether a README link redirects to another block, a Wikipedia page, or some other site like Wolfram or a random Professor's web page. Some subtle highlighting for the terms we are already selecting and storing in the database might be nice.

bmershon commented 8 years ago

A first attempt at extracting useful information from the rendered README.md:

screen shot 2015-12-17 at 3 08 59 pm

screen shot 2015-12-17 at 3 16 38 pm

Yellow: wikipedia link Green: link to another block Pink: link to wikibooks (algorithm pages, typically)

Tags and categories can be extracted from the href of the various anchors one might find in the rendered README.md.

Using mutation observers, information is sent piecemeal for updates to the server as these parts of the bl.ocks page are rendered after the DOM has initially loaded. Remember, the content script is injected after the DOM loads, but subsequent rendering prevents us from immediately attempting to select for elements that may not have been created yet.

For the Convex Hull example, there was a mutation when the README.md file was rendered. Below is the object that was sent to the server in order to record a potentially new gist, along with categories and tags that have been found by some means besides parsing the description:

{
    "gistid": "6f14f7b7f267a85f7cdc",
    "description": "Convex Hull",
    "username": "mbostock",
    "tags": [
        "monotone",
        "chain"
    ],
    "categories": [
        "Algorithm_Implementation,Geometry,Convex_hull"
    ]
}

Another mutation occurred when the index.html code block was rendered by the syntax-highlighting library. Currently, CSS classes are set to make note of items such as the description and anchors which we don't want to record in the database more than once. This allows for an update() function to be called every time there is a mutation caused by some part of the bl.ocks page being rendered, without redundancy in the information sent to the server. When an element has the CSS class .recorded, we know that we should not attempt to send more information to the server for that particular element. So in the case of subsequent mutations after the README has been rendered and its links have been parsed and recorded, we can avoid redundant attempts to insert or update information in the database.

Is the piecemeal approach to updating/discovering a block a wise decision? I'm not sure. I'm open to other suggestions.

TODO

The content script blocks.js is getting a little big; refactoring some of the origin parsing code is the next step. It's already obvious that we need a modular plug-in style architecture for handling new origins (e.g. wikibooks, wolfram alpha) as we go.