Closed triska closed 4 years ago
For comparison, the 008000
part does not occur when I download the document via wget
:
$ wget http://github.com/mthom/scryer-prolog -O sc
Looking at sc
, it starts with \n\n\n\n\n
. So, in fact, also the initial \r\n
is unexpected: The initial008000\r\n
is not expected to occur in the document.
wget
is too nice and hides/processes the data for the user.
Right now, Scryer-prolog
is dealing with raw data and is correct (it depends on what the user is expecting).
By using ncat
:
$ ncat --version
Ncat: Version 7.80 ( https://nmap.org/ncat )
$ ncat --ssl github.com 443 > output.txt # Enter twice then ctrl-d.
GET /mthom/scryer-prolog HTTP/1.1
Host: github.com
$ head -n 32 output
HTTP/1.1 200 OK
date: Wed, 17 Jun 2020 09:27:31 GMT
content-type: text/html; charset=utf-8
server: GitHub.com
status: 200 OK
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With, Accept-Encoding
etag: W/"a08468419f58f3c1d5448bc3445800ef"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
expect-ct: max-age=2592000, report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; connect-src 'self' uploads.github.com www.githubstatus.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com logx.optimizely.com/v1/events wss://live.github.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: github.githubassets.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; worker-src github.com/socket-worker.js gist.github.com/socket-worker.js
Set-Cookie: _gh_sess=t0B3EJLwxHhxQHp7O8EcNx4%2FDQL%2BwKuNuUNUYl87fCRgC1Z%2BMSOK4BHje0bY9LzGvZjfRRzKXtkG1B%2FMLgTioDndgBqccSt9EnBzrnzAE6gpPtOSQMiJj0pwyFMcmWyReyF0ngfAEhDNQ8A9OpP4dGzkvrwsh%2BtXUS9LxV8bV7xSy5jfTh8RLjefFfqGKPOlFyALsTLe84Yf2OmyjF%2BVM3%2FJowXPrppvP%2FF1QA3z6PrISpD%2F%2BkUDbFYy4fjdiuoBhsqyF8bbhDsyAbfgrhG1Ug%3D%3D--lh0fk71u2AVhVNIs--ofZ%2BGicUreKuGLNdppdmHA%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
Set-Cookie: _octo=GH1.1.1647474019.1592386050; Path=/; Domain=github.com; Expires=Thu, 17 Jun 2021 09:27:30 GMT; Secure; SameSite=Lax
Set-Cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 17 Jun 2021 09:27:30 GMT; HttpOnly; Secure; SameSite=Lax
Accept-Ranges: bytes
Transfer-Encoding: chunked
X-GitHub-Request-Id: 989A:36169:E12533:141CF36:5EE9E200
008000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<link rel="dns-prefetch" href="https://github.githubassets.com">
$
My guess its the chunk size. Since the response isTransfer-Encoding: chunked
, you need a special decoder to decode the result from the server. In Java the URLConnection does this for you.
See also: Transfer-Encoding: chunked
Thank you a lot for looking into this!
In #592 , I have now worked around this issue by using HTTP/1.0 for the request.
To reproduce this issue, please download
github.pl
from:https://www.metalevel.at/prolog/scryer/github.pl
The predicate
stream/2
useslibrary(sockets)
to establish a TLS connection togithub.com
and asks for the document/mthom/scryer-prolog
via HTTP. The first argument is unified with a stream from which we can read the body of the HTTP response. The second argument is the list of HTTP header lines of the response.For instance, to read the first 20 characters of the document, we can use for example:
As answer, we get:
The initial "008000" part of the answer is unexpected, and does not arise as part of the document from any other client I tried to access this page.
The remainder of the document seems to be perfectly correct and as expected. Also, pages from other web sites I tried are received correctly and exactly as expected.
What is going on? This may be an interesting issue to look into for new contributors who are interested in the networking functionality of Scryer Prolog. I would greatly appreciate your help!