witnet / witnet-rust

Open source Rust implementation of the Witnet decentralized oracle protocol, including full node and wallet backend 👁️🦀
https://docs.witnet.io
GNU General Public License v3.0
180 stars 56 forks source link

Support HTTP/HEAD data requests #2331

Closed guidiaz closed 8 months ago

guidiaz commented 1 year ago

So only the metadata from a web resource is actually fetched in the form of HTTP response headers.

For instance, these HTTP response headers would be quite handfull for implementing content-based use cases with Witnet:

$ curl --head https://witnet.io/_nuxt/img/dragon_reading.a37f8cb.png
HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 498219
etag: "632067ee-79a2b"
...
$ curl --head https://ipfs.io/ipfs/QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE
HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 38376
Etag: "QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE"
X-Ipfs-Path: /ipfs/QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE
X-Ipfs-Roots: QmQqzMTavQgT4f4T5v6PWBp7XNKtoPmC9jvn12WPT3gkSE
...

The implementation of HTTP/HEAD requests should:

Also, it would be nice to support a new String operator in Radon:

Possible new use-cases for Witnet:

Javascript DSL usage example:

const token_image_digest = new Witnet.HttpHeadSource(
  "https://api.game.art/images/1.png",  {
    "Transfer-Encoding": "identity"
  }
)
  .parseJSONMap() // perhaps not necessary, as response to HttpHeadSource should always be a key/value map
  .mapGetString("Etag")
  .stringDecode(Witnet.HASHES.SHA256)

tmpolaczyk commented 1 year ago

It would be nice to be able to access the response headers in HTTP GET/POST requests as well. Perhaps using some flag at the source level to indicate whether we want the response body, the headers, or both. That could also be extended to support binary sources as requested in #2274, by being able to specify the body encoding.

tmpolaczyk commented 1 year ago

This is the header format we can get from the http library surf:

response_headers: BTreeMap<String, Vec<String>>

The value is Vec<String> because header names can be repeated:

Cache-Control: no-cache
Cache-Control: no-store

And the header order is important. Is it possible to do handle each header as an array using radon? For example we would need ArrayContains to check if a header value exists, and also some operator to enforce the size of the array to 1 and get the value.

Or shall we always return only one header? (the first or last instance). Because our HTTP POST headers implementation ignores repeated headers, so should we fix that as well? Although in that case the surf library does not allow us to maintain the order, so if order is important then we may need to change the http library.

In the sample HTTP HEAD response:

HEAD /_nuxt/img/dragon_reading.a37f8cb.png

HTTP/1.1 200 OK
Content-Type: image/png
Content-Length: 498219
etag: "632067ee-79a2b"

With the current proposal we will be unable to do anything with the first line (HTTP/1.0 200 OK), so we cannot check the HTTP version or the status code. We also do not get any way to see the order of the headers, but I guess that won't be an important use case.

However we could get access to all that information if we use an array instead of a map, and then iterate over the values, or if we keep the headers as a string, and add a new StringParseHttpHeaders operator. Then if we had a StringSplit(\n) operator, we would be able to read the headers one at a time, as well as the http version and status code. We would need to change the http library to implement that, but it should be doable.

So, summary, the input of the radon script can be a:

* Map<String, Vec<String>>
* Map<String, String>
* Array<Array<String>>, [[k1, v1], [k2, v2]]
* String, and manually use radon operators to turn it into a Map
tmpolaczyk commented 1 year ago

Also it looks like the surf client in its current configuration doesn't support HEAD requests because of a bug, see this issue: https://github.com/http-rs/surf/issues/218#issuecomment-770254047

So most probably we will need to use another http client. We are already using isahc which is a low level wrapper around libcurl, and it's also used by surf, so I will create an issue to stop using surf and use isahc directly.

guidiaz commented 1 year ago

With the current proposal we will be unable to do anything with the first line (HTTP/1.0 200 OK),

I don't think we'd need to include first line (transport level) within the data request response (app level). Ofc, it has to be interpreted by the node as to know whether the HTTP/HEAD request was successfull or not.

guidiaz commented 1 year ago

Then if we had a StringSplit(\n) operator, we would be able to read the headers one at a time

The radon script needs to access headers as a map some way or another, as it cannot assume the headers will be returned in any specific pre-known order.

aesedepece commented 1 year ago

@guidiaz please ping me before you go about this to discuss some challenges.

guidiaz commented 1 year ago

By directly using http::Response, we can get the response to an HTTP/HEAD request embedded into a http::HeadersMap<HeadersValue>, that can then be serialized as JSON string. This way, responses to HTTP/HEAD requests can be assumed to be a RadonString value, parseable to RadonMap via the StringParseJSONMap Radon operator.