mthom / scryer-prolog

A modern Prolog implementation written mostly in Rust.
BSD 3-Clause "New" or "Revised" License
2.05k stars 121 forks source link

library(sockets): Unexpected response from github #598

Closed triska closed 4 years ago

triska commented 4 years ago

To reproduce this issue, please download github.pl from:

https://www.metalevel.at/prolog/scryer/github.pl

The predicate stream/2 uses library(sockets) to establish a TLS connection to github.com and asks for the document /mthom/scryer-prolog via HTTP. The first argument is unified with a stream from which we can read the body of the HTTP response. The second argument is the list of HTTP header lines of the response.

For instance, to read the first 20 characters of the document, we can use for example:

?- stream(S, _), length(Ls, 20), maplist(get_char(S), Ls).

As answer, we get:

   S = '$stream'(...), Ls = "008000\r\n\n\n\n\n\n<!DOCT ..."

The initial "008000" part of the answer is unexpected, and does not arise as part of the document from any other client I tried to access this page.

The remainder of the document seems to be perfectly correct and as expected. Also, pages from other web sites I tried are received correctly and exactly as expected.

What is going on? This may be an interesting issue to look into for new contributors who are interested in the networking functionality of Scryer Prolog. I would greatly appreciate your help!

ghost commented 4 years ago

Thats an unusual BOM.

triska commented 4 years ago

For comparison, the 008000 part does not occur when I download the document via wget:

$ wget http://github.com/mthom/scryer-prolog -O sc

Looking at sc, it starts with \n\n\n\n\n. So, in fact, also the initial \r\n is unexpected: The initial008000\r\n is not expected to occur in the document.

ghost commented 4 years ago

wget is too nice and hides/processes the data for the user.

Right now, Scryer-prolog is dealing with raw data and is correct (it depends on what the user is expecting).

By using ncat:

$ ncat --version
Ncat: Version 7.80 ( https://nmap.org/ncat )
$ ncat --ssl github.com 443 > output.txt  # Enter twice then ctrl-d.
GET /mthom/scryer-prolog HTTP/1.1
Host: github.com

$ head -n 32 output
HTTP/1.1 200 OK
date: Wed, 17 Jun 2020 09:27:31 GMT
content-type: text/html; charset=utf-8
server: GitHub.com
status: 200 OK
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With, Accept-Encoding
etag: W/"a08468419f58f3c1d5448bc3445800ef"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events wss://live.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/socket-worker.js gist.github.com/socket-worker.js
Set-Cookie: _gh_sess=t0B3EJLwxHhxQHp7O8EcNx4%2FDQL%2BwKuNuUNUYl87fCRgC1Z%2BMSOK4BHje0bY9LzGvZjfRRzKXtkG1B%2FMLgTioDndgBqccSt9EnBzrnzAE6gpPtOSQMiJj0pwyFMcmWyReyF0ngfAEhDNQ8A9OpP4dGzkvrwsh%2BtXUS9LxV8bV7xSy5jfTh8RLjefFfqGKPOlFyALsTLe84Yf2OmyjF%2BVM3%2FJowXPrppvP%2FF1QA3z6PrISpD%2F%2BkUDbFYy4fjdiuoBhsqyF8bbhDsyAbfgrhG1Ug%3D%3D--lh0fk71u2AVhVNIs--ofZ%2BGicUreKuGLNdppdmHA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
Set-Cookie: _octo=GH1.1.1647474019.1592386050; Path=/; Domain=github.com; Expires=Thu, 17 Jun 2021 09:27:30 GMT; Secure; SameSite=Lax
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 17 Jun 2021 09:27:30 GMT; HttpOnly; Secure; SameSite=Lax
Accept-Ranges: bytes
Transfer-Encoding: chunked
X-GitHub-Request-Id: 989A:36169:E12533:141CF36:5EE9E200

008000

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
  <link rel="dns-prefetch" href="https://github.githubassets.com">
$
ghost commented 4 years ago

My guess its the chunk size. Since the response isTransfer-Encoding: chunked, you need a special decoder to decode the result from the server. In Java the URLConnection does this for you.

See also: Transfer-Encoding: chunked

triska commented 4 years ago

Thank you a lot for looking into this!

In #592 , I have now worked around this issue by using HTTP/1.0 for the request.