kartikprabhu / mf2py

mf2 parser in python (this is an experimental fork)
Other
3 stars 2 forks source link

include img alt and src in p-* and e-* parsing #61

Closed kartikprabhu closed 6 years ago

kartikprabhu commented 6 years ago

According to http://microformats.org/wiki/microformats2-parsing in the last step for p-* and e-* parsing the <img> should be parsed for alt and src to get textContent

Alt Example

HTML

<html>
<body class="h-entry">
<p class="p-name">text text 
<img alt="image alt"/>
</p>
</body>
</html>

Current output

    "items": [
        {
            "type": [
                "h-entry"
            ], 
            "properties": {
                "name": [
                    "text text"
                ]
            }
        }

Expected output

    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "name": [
                    "text text \r\nimage alt"
                ]
            }
        }

Src Example

HTML

<html>
<body class="h-entry">
<p class="p-name">text text 
<img src="image src"/>
</p>
</body>
</html>

Current output

"items": [
        {
            "type": [
                "h-entry"
            ], 
            "properties": {
                "photo": [
                    "image src"
                ], 
                "name": [
                    "text text"
                ]
            }
        }
    ]

Expected output

"items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "name": [
                    "text text \r\n image src"
                ],
                "photo": [
                    "image src"
                ]
            }
        }
    ]

Note the space inserted before the <img> src from the spec, which phpmf2 does not produce.

Also, \r\n is next line as outputed by phpmf2