kartikprabhu / mf2py

mf2 parser in python (this is an experimental fork)
Other
3 stars 2 forks source link

in-reply-to url also incorrectly sets url property #20

Closed snarfed closed 10 years ago

snarfed commented 10 years ago

it looks like if there's a u-in-reply-to, that url is also populated in the url property for the object itself. (it shouldn't be.) e.g. this markup:

<article class="h-entry p-comment">
  <div class="e-content">my comment</div>
  <a class="u-in-reply-to" href="https://twitter.com/schnarfed/status/436362042660753409"></a>
</article>

returns this parsed object:

{"rels" : {},
 "items" : [{
       "type" : ["h-entry" ],
       "properties" : {
          "in-reply-to" : [
             "https://twitter.com/schnarfed/status/436362042660753409"
          ],
          "content" : [{
                "html" : "my comment",
                "value" : "my comment"
             }],
          "url" : [
             "https://twitter.com/schnarfed/status/436362042660753409"
          ],
          "name" : ["my comment"]
       }
    }
 ]}
kartikprabhu commented 10 years ago

@snarfed that u-url is coming from the implied properties parsing rule http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties

As it stands it says nothing about ignoring u-in-reply-to or other u-* properties. Do you think that needs to be changed?

the php-mf2 parser also does the same thing.

aaronpk commented 10 years ago

@tantek is the one to ask here. It seems that if a link has u-in-reply-to then the implied rules should be ignored?

snarfed commented 10 years ago

i'd guess something stronger than that. i'm pretty mf2 ignorant, but to me it seems like the very last line in the parsing for implied properties alg is wrong. it says, as a final fallback, grab the first non (h-*) link inside the element and use that as the url for the element itself...but links like that will usually point somewhere else, regardless of their mf2 class, right?

thanks in advance @tantek! not urgent. @barnabywalters is also familiar with this stuff, so i'm inflicting this on him too. discussion in IRC.

kartikprabhu commented 10 years ago

@snarfed the last line actually says grab the non (h-) link inside the element, if it is the only link which is a direct child inside the element. So if you link to a bunch of things it won't pick any as the implied url.

snarfed commented 10 years ago

@kartikprabhu interesting, ok. still seems wrong, but who knows.

snarfed commented 10 years ago

aha, i might understand better now. i guess it's expected that you have elements like e-content inside the h-, and user-visible links are inside there or other similar elements. top-level links directly inside the h- are expected to be 'meta' elements, e.g. the permalink.

ok, makes sense. that's what i get for never taking the time to really understand mf2. i retract my complaints. :P

tantek commented 10 years ago

No you should never come up with special cases like that to ignore the implied rules - because that makes them less predictable and thus harder to use. There is a much simpler solution. Turn the u-in-reply-to into a citation (since it is!) instead of just: <a class="u-in-reply-to" href="https://twitter.com/schnarfed/status/436362042660753409"></a> use: <a class="u-in-reply-to h-cite" href="https://twitter.com/schnarfed/status/436362042660753409"></a>

This also gives you automatic structure for if/when you decide to put in an actual name for the citation as well, rather than just an empty link.