clarifying URLEncoded form meaning

bblfish commented 8 years ago

In Lack of context WebMention the problem of the meaning of URLEncoded forms as paramter/values is considered.

This may not be that difficult to do. We could define a new Link: relation say urlencoded that would point to a transformer from urlencoding to rdf. This would allow a client on making a request to a webmention endpoint

GET /webmention HTTP/1.0

To retrieve a result such as this (see Web Linking RFC), where of course the "urlencoded" relation needs to be described and registered correctly.

200 Ok
Link: <http://w3c.org/social/WebMention>; rel="urlencoded"

The document at <http://w3c.org/social/WebMention> would have both an HTML representation and a machine readable representation.

The human readable representation would just explain how webmention works, the header, and some explanations of the mapping.
The machine readable form would have to give a simple method to transform the attribute/values into a graph with well understood, extensible semantics.

What one really wants is the ability to also retrieve a machine readable document from http://w3c.org/social/WebMention that would describe the url encoded form. It would have some yet to be determined mime type (that is not html), and would return something like this:

PREFIX ping: <http://purl.org/net/pingback/>
CONSTRUCT { 
  [] ping:source ?source;
     ping:target ?target . 
} WITH ?source ?target

Where ?source and ?target are the attribute names of the form. This would allow the WebMention enabled clients to continue sending the attribue value pairs as they do now,

source=http://joe.name/card
target=http://jane.name/other

and would allow a robot to interpret that to be equivalent to the rdf graph written out in Turtle as

@prefix ping: <http://purl.org/net/pingback/> .

[] ping:source <http://joe.name/card>;
   ping:target <http://jane.name/other> .

( clearly there is a piece of syntax still missing in the sketched language to turn the ?source and ?target strings into URLs) This is not that complicated and would allow us to de-siloeify all forms on the web.

This would allow the IndieWeb folk to increase the security of their protocol while retaining their principle of remaining accessible, and it would allow this to be integrated generically into the SoLiD platform, so as to reduce configuration mistakes, and make it easier to automatically create such resources. This would require from the LDPnext side to work out how one can increase the mime type to such a urlencoded form.

bblfish commented 8 years ago

In response to @gobengo's remark in issue 9, here is an reason why just using URLs as attribute names does not get us very far. Attribute values in a string are not good enough to reconstruct the data structure.

Here is an example: imagine you have a form ask you for your name, your age, and a number of address fields where the list of attribute values would end up being for example:

street=19 rue Saint Honore,
city=Fontainebleau
country=France
zip=77300
name=Henry
age=47

to make it simple. ( If you want pingback to be extensible you could imagine pinging such a piece of information ). Having the attributes use the foaf or card ontology urls would not help the server reconstruct the data structure behind it which would be something like

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix contact: <http://www.w3.org/2000/10/swap/pim/contact#> .

[]  foaf:name "Henry";
    foaf:age 47;
    contact:home 
            [ a contact:ContactLocation;
                contact:address [ contact:city "Fontainebleau";
                                  contact:country "France";
                                  contact:postalCode "77300";
                                  contact:street "19 rue Saint Honore" 
                                ] .

There is no way to know without extra context information how to go from such an attribute pair value to the above more complex structure. What is needed is exactly for a form with the attribute value pairs to return a way for a client to be able to work out what graph of information those would result in. We could use the same mechanism as the one described above. The client could make a request to the /address resource

GET /address HTTP/1.0

but this time instead of a WebMention link "urlencoded" it could return the header

200 Ok
Link: <http://postal.org/Address>; rel="urlencoded"

where someone would have published at http://postal.org/Address a doument that would allow full meaning to be deduced

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix contact: <http://www.w3.org/2000/10/swap/pim/contact#> .

CONSTRUCT { 
[]  foaf:name ?name;
    foaf:age ?age;
    contact:home 
            [ a contact:ContactLocation;
                contact:address [ contact:city ?city;
                                  contact:country ?country;
                                  contact:postalCode ?zip;
                                  contact:street ?street 
                                ] .
} WITH ?name ?age ?city ?country ?zip ?street

And we need a way there to be a way to specify that the age is an integer, so there is an extra datatype conversion syntax piece required.

rhiaro commented 8 years ago

@melvincarvalho Replying re: subject in https://github.com/w3c-social/webmention/issues/9#issuecomment-159966620 Think of posting a webmention like writing to an LDP container - when you post the subject hasn't been created yet, but the server then creates a URI for it and returns that to you. That's already in the spec. So the subject doesn't ever need to be a blank node, but the URI is generated by the receiver, not the sender.

Re: all of above about posting things other than source and target...

You have to fetch the source resource to verify it anyway, so you don't need the sender to send all the content in the webmention, cos you can just get what you want during the verification stage.
If for some reason you did want to send the whole source, you probably have a pre-existing relationship with the server you're sending to, so you can have figured out what to send/receive mutually in advance?
If your endpoint accepts arbitrary data and you don't verify it by getting the source or pre-authing with the sender, you're a) violating the spec and b) setting yourself up for some serious spam problems.

bblfish commented 8 years ago

There is nothing that specifies in the URLEncoding specification that you are meant to think of things the way you want to interpret them @rhiaro . The example I posted above with the address is just meant to show that most forms create data structures that are much more complex than the one envisaged by the initial webmention protocol, so that the issue of the subject is just one small issue among many that need to be considered. How is the server meant to know which fields go together? Which fields are properties of an address and which one of the person? In the above case there are blank nodes between the person and the address, because houses can have more than 1 address ( I was at an apartment in Paris in which that was the case ). Vice versa: how is the client meant to know that the server receiving the properties is going to interpret the properties the way you wish them to interpret them?

For that to work you need the server to provide a mapping from a form to an explicit semantics of how these property values will be interpreted. Anything else is quite literally wishful thinking.

That is what this issue is considering.

As I remember in Paris F2F the IndieWeb folks had developed protocols with much more complex Forms than the webmention one discussed here. If we are to enable all of those, and many more that will follow, we need a general way to deal with all of them, or else we'll be inventing ad hoc interpretations for each different service.

dissolve commented 8 years ago

Trying to follow all the threads going on from email notifications.

Perhaps the spec should make it clear that NO DATA other than addresses of location where the data can be fetched be provided unless some authentication is provided (leaving method of auth up to implantation) At that point you can include a single field for the encoding and a single field for a serialized form of data.

Basically Source=https://example.com/somesource Target=https://example.net/target Encoding=activity+json Data={@context: ......}

Sorry for the abbreviated format. On my phone.

So form encoding not be any data other than URLs, encoding value, and then serialized data which can handle it's own definition of what values mean.

bblfish commented 8 years ago

@dissolve that does not answer the question as to how the client knows that the server will actually interpret what you send this way, or how a client that lands on such a resource could find out what to send it. Furthermore your answer seems again very ad-hoc.

Also, why not use a web browser to follow the discussions. It's a lot easier :-)

rhiaro commented 8 years ago

Re: "how the client knows that the server will actually interpret what you send this way" - the client just discovered the endpoint to post to. The server, in pointing to a webmention endpoint, is saying 'you can post source and target here according to the webmention spec'. Right..?

Re: "how a client that lands on such a resource could find out what to send it" - hmm, definitely an interesting problem, but I think perhaps out of scope. The spec is defining what to do for a client who is specifically looking to send a webmention (in which case they follow the endpoint discovery steps), not a client who is randomly crawling your site looking for places to post to. Whilst it would be cool if we could 'just' hypermedia all the things, I think this adds a level of complication that might hinder adoption for something that could otherwise be easy for people to pick up and implement on a whim.

bblfish commented 8 years ago

@rhiaro you don't take into consideration

that people may maliciously point to a form that uses the same attribute values as those you are using. ( source and target seem to be very meaningful in a military context, just as much as in the webmention one)
the web is designed so that there is no one way to reach a resources. You have to take into account that people may bookmark resources, republish links to them, etc. etc... You cannot guarantee that someone reached your resource through only one type of link. That's basic web architecture.

rhiaro commented 8 years ago

@bblfish Assume I'm a bit dense, and walk me through 1. a) I link to a site which unbeknownst to me has a malicious, fake webmention endpoint, and discover this, and post to it. b) I have posted to a form 2 URLs, which it accepts and processes in order to misuse. c) ..? Given that a malicious script can go and crawl for pages which link to other pages, if that's all it needs to do malicious things, I'm not sure what extra problems are being caused.

2 - Sure, but this spec deals with the instance where someone did reach my resource (webmention endpoint) through this specific route. One could additionally extend one's webmention endpoint with hypermedia controls (handwaving hypermedia terminology here, sorry if I screw that up) but that's not part of this spec.

bblfish commented 8 years ago

@rhiaro

you have software that comes across a site that points perhaps mistakenly, perhaps maliciously to a form that accepts the same attribute value pairs as the one specified by WebMention. You then post your content to that site, giving you results other than those you expected. The site containing the form may not at all be malicious. It may be a site you are already connected to and authenticated with, and the form may be one to buy something. How does the software you have make sure it actually knows that the form you are sending is going to be interpreted the way you were hoping it would be. That's the question one is confronted with if one seriously builds a global protocol.
You cannot wish away the difficult cases by only considering the ones that work the way you would like them to work. Imagine if bridge builders or sky scraper architects thought that way.

In any case I have put forward a simple proposal that would allow this case to be solved and an infinite number of others too in a way that is

compatible with the current WebMention spec - it just requires the endpoint to publish an extra Link header
does not require web developers who are used to short attribute/values to create long complex ones
is extensible to more complex versions of pingback protocol
provides a seamless path to make both IndieWeb folks and LDP people to work together the way they like with minimum disturbance.

gobengo commented 8 years ago

@rhiaro

Think of posting a webmention like writing to an LDP container - when you post the subject hasn't been created yet, but the server then creates a URI for it and returns that to you. That's already in the spec. So the subject doesn't ever need to be a blank node, but the URI is generated by the receiver, not the sender.

@bblfish

There is nothing that specifies in the URLEncoding specification that you are meant to think of things the way you want to interpret them @rhiaro .

There's more the request than just the request body and content-type. The HTTP Post method is what implies what the body is for.

gobengo commented 8 years ago

In any case I have put forward a simple proposal

I can't really understand what that is, but looks like it involves a 'urlencoded' link relation that I can't find any mention of anywhere else.

is extensible to more complex versions of pingback protocol

I sense an aversion from others to make a more complex version of the pingback protocol, and to keep the webmention 'core' as simple as it the current draft is. And I think I agree after the discussion in #3. More complex things (other properties, other mimetypes) can always be extensions once proven out.

provides a seamless path to make both IndieWeb folks and LDP people to work together the way they like with minimum disturbance.

I think we can do this just by recommending semantics for source/target as described in #9

bblfish commented 8 years ago

@gobengo POST does make it explicit that you are creating or altering a resource, but that does not provide enough information to tell you what the content of the message is, when combined with application/x-www-form-urlencoded mime type. It is the mime type that usually helps make the interpretation of the content explicit, but urlencoding does not provide enough of it - in the non human readable web that is. ( In the html document web the form is produced by the same origin as the "endpoint" and the context is interpreted by very context sensitive agents called humans)

bblfish commented 8 years ago

@sandhawke asked the following in issue 10

@bblfish how can a header possibly help, since the agent doing the POST won't see the header until after it's completed the POST.

As this is more relevant to this issue I'll post the answer here:

This does indeed require an initial request on the endpoint/container.

The agent could do a HEAD first, and then a POST
for JS browser agents that want to POST to a different origin, there is in any case the cost of the initial CORS preflight request.
a GET on the endpoint may usefully return extra information in any case in which case the initial request would not be wasted. This may be the case if the endpoint were an LDPC container for example.
perhaps if the link comes from the same origin as the the endpoint then it could be argued that the extra HEAD or GET is not necessary. The agent just takes an extra little risk by doing so.
arguably the GET or HEAD could be amortized over a long period, as the response can be cached

bblfish commented 8 years ago

@sandhawke nobody in the LDP group was pushing application/x-www-form-urlencoded communication, so this discussion certainly would not have come up. The IndieWeb group on the other hand are putting forward a number of protocols to use application/x-www-form-urlencded. A number of them were put forward at the Paris Face to Face, all based on their WebMention experience. If there is to be interaction between LDP group and such protocols something like this is needed. Different forum different problems.

csarven commented 8 years ago

IWC's work on micropub for instance is essentially RDF/POST. The fundamental difference is that, IWC places its bet on the microformats vocabulary and hardcoding those terms for the parameter names. That particular approach is unfortunately bound to fail on longevity. Visible lesson: mf1->mf2. Lifetime of mf1 terms: ~5 years. Promotes regrettable HTML markup like for instance <p class="p-name entry-title e-content entry-content article"> which some prefers to maintain code bloat.

bblfish commented 8 years ago

The IndieWebCamp has not yet taken the name space issue into consideration, so its not re-inventing RDF/POST. It is unlikely that #9 or RDF/POST will be acceptable to IndieWebCamp given that it is not easy for their developers to see the point of making forms so heavy weight. On the other hand the answer proposed here could bridge the gap elegantly, as it make no requirement on the developer to do anything very much out of the ordinary, other than adding a header to the endpoint, which I think will be needed for any machine readable service.

csarven commented 8 years ago

IWC has considered namespaces but decided against it. Baggage from microformat's namespaces considered harmful and handful of other cherry picked anti-patterns - essentially conclusion/agenda first, compiling only supporting evidence next by the vocal minority, in combination with wiki policing etc.. Having said that, if the IWC community is comfortable with that type of governance, and can solve their own problems without namespaces, all the power to them. That unfortunately makes it difficult to interop with the other approaches.

bblfish commented 8 years ago

yes, that is why this proposal does not require IWC users to be preoccupied with namespaces. There is just the requirement for a Link Header, and they can work with the Semantic Web team where we can put what we need at the URL location to be able to automate server and clients.

sandhawke commented 8 years ago

@csarven I think interop is still easy, as long it's clear when one is at the boundary. For example, one might use the rel='webmention' as a boundary signpost. It's a little more code, but it's straightforward.

bblfish commented 8 years ago

Just noting that @kevinmarks is echoing some of the points made here in issue 4. For those who don't like the Army example @melvincarvalho came up with an alternative one:

Donald wishes to say good night to his wife. He uses a webmention form to to his wife's endpoint pointing to the message "Good night, dear".

However Donald happens to be logged in to his work account which is connected to a drone system. By mistake the webmention is routed to a form which is used to target drone strikes. By adding "target" of his wife, the AI enabled drone system is able to deduce that Donald requires a strike to be carried out against the target.

If the software did what we all do before posting something, namely reading the page that requires us to click the button, then it would have found the required Link relation and it would have understood that that endpoint will not interpret its attribute values in the intended manner.

I'll additionally note that this is well undersood in web security, which is why there is something such as CORS preflight requests.

melvincarvalho commented 8 years ago

@bblfish what about a simpler solution. Simply add the parameter

type=Webmention

That should provide enough context for a processor to not get confused.

It would also play nicely with a linked data paradigm where you'd want to add rdfs : type = Webmention.

Does it scale to the web? Possibly not. But would it work in practice? Probably yes.

However, in a JSON formulation, which I think is the consensus in this group for passing messages around, I think it might work just fine. And also you then dont need the preflight.

Would that work, or did I miss something?

bblfish commented 8 years ago

If it does not scale to the web, then it can't work in practice on the web. The reason CORS ads preflight requests on POST, PUTs and other non idempotent methods is for reasons that are not far from the issues being discussed here.

aaronpk commented 8 years ago

Closing this issue due to lack of clear issue/suggestion and no further interest in the past 3 months. Please open a new issue with a specific topic or suggestion if you would like to discuss further.

w3c / webmention

clarifying URLEncoded form meaning #11