neocotic / europa

Library for converting HTML into valid Markdown
MIT License
159 stars 24 forks source link

(#27) Updates test fixture to contain properly formatted html #28

Closed tjchaplin closed 11 years ago

tjchaplin commented 11 years ago

This patch is needed so that tests pass properly and work with valid html. The malformed html was specific to void elements. void elements shouldn't contain any contents. See W3 standard for further reference.

The patch has updated the test fixture so that it doesn't contain the malformed html. Fixes #27

neocotic commented 11 years ago

Your change seems to remove the test that void elements have their contents hidden. Why?

tjchaplin commented 11 years ago

The previous fixture looked something like this for void elements:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title></title>
</head>
<body>
    <command>Should be hidden</command>
</body>
</html>

Putting contents into a void command is invalid. Also when I type:

md -p test.html

for the above html it generates:

Should be hidden

I have also put together an example using jsdom to show what happens when parsing the dom on a void element. The void element is not being parsed as a child node of a void element.

Here is the output:

Level:0-HTML
    Level:1-BODY
        Level:2-COMMAND
        Level:2-Should be hidden
        Level:2-DIV
            Level:3-Some contents

You can see that the Should be hidden is being parsed on the same level as the command(void element) tag. It is not being parsed as a child node, If it was then you would be able to hide it.

here is the code:

var jsdom = require("jsdom");

jsdom.env(
  '<command>Should be hidden</command><div>Some contents</div>',
  [],
  function (errors, window) {
    displayNodeElements(window.document.childNodes[0],0)
  }
);

var displayNodeElements = function(node,level){
    var leadingTabs = '';
    for(var i = 0;i<level;i++)
        leadingTabs += '\t';

    if(node.nodeType === node.TEXT_NODE)
        console.log(leadingTabs+"Level:"+level+"-"+node.nodeValue);
    else
        console.log(leadingTabs+"Level:"+level+"-"+node.tagName);

    if(!node.childNodes)
        return;

    level++;
    for (var i = 0; i < node.childNodes.length; i++) {
        displayNodeElements(node.childNodes[i],level);
    };
};

I changed the test fixture so that the void element tags don't include contents, as it doesn't appear to be parsing the contents as child nodes. Also, as you said your tests are passing on your machine here are the details for mine:

Mac OSX - version 10.7.5

uname -v
Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64

node -v
v0.10.15

Please let me know if you need any other details.

neocotic commented 11 years ago

This is really insightful, thank you. I'm not sure why the unit tests aren't failing on this already as the output must match that which is expected (i.e. content of void elements is indeed hidden). Regardless, could you fix the full problem by also removing these tag names from the R_IGNORE_CHILDREN regular expression in md.coffee?

Also, please don't commit changes for the generated files (_.js) as I try to do this only once per release. I've created a new _wip-3.0.0* branch. Would you mind opening another PR against this branch?

Sorry for the inconvenience. I thought I had already created this branch.

Finally, thanks for your help and contributions.

neocotic commented 11 years ago

Sorry, ignore my comment about committing changes to the generated JS files. Of course you can!

I've updated Gruntfile.coffee on the new branch to no longer update the dist file and docs when running the standard build.

neocotic commented 11 years ago

I've submitted PR #31 which includes your fix as well as covering the changes for the library itself. You contribution will still be included in AUTHORS.md. Thanks for you help!