openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
863 stars 210 forks source link

Add support for chunked upload #783

Open bergos opened 6 years ago

bergos commented 6 years ago

I'm trying to upload N-Triples using curl and the Graph Store interface. This works for curl file uploads, but fails if stdin should be used. I think it's a problem with the chunked transfer. curl switches to chunked transfer for data from stdin, as it can't know the content size. Virtuoso closes the connection in that case. If chunked transfer is not supported, I would expect an http error. Even better would be adding support for chunked transfer. In my simplified example I'm using a file, but in the actual use case I'm getting triples from a program and I would like to avoid creating temporary files just for the upload. Maybe this issue is related to the Problem uploading gzipped RDF using curl issue.

Example chunked transfer not working:

cat energy-cube.nt | curl -v --digest --user dba:dba --request PUT --header "content-type:application/n-triples" --upload-file - --get "http://nas.example.org:8891/sparql-graph-crud-auth" --data-urlencode "graph=http://dark-horse.example.org/graph/energy"

*   Trying 192.168.1.254...
* TCP_NODELAY set
* Connected to nas.example.org (192.168.1.254) port 8891 (#0)
* Server auth using Digest with user 'dba'
> PUT /sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy HTTP/1.1
> Host: nas.example.org:8891
> User-Agent: curl/7.58.0
> Accept: */*
> content-type:application/n-triples
> Content-Length: 0
>
< HTTP/1.1 401 Unauthorized
< Server: Virtuoso/07.20.3215 (Linux) x86_64-pc-linux-gnu
< Connection: close
< Content-Type: text/html; charset=UTF-8
< Date: Sun, 16 Sep 2018 11:19:01 GMT
< Accept-Ranges: bytes
< WWW-Authenticate: Digest realm="SPARQL", domain="/sparql-graph-crud-auth", nonce="1d7593ec87590a610cae1dbc0cd4578c", opaque="5ebe2294ecd0e0f08eab7690d2a6ee69", stale="false", qop="auth", algorithm="MD5"
< Content-Length: 0
<
* Closing connection 0
* Issue another request to this URL: 'http://nas.example.org:8891/sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy'
* Hostname nas.example.org was found in DNS cache
*   Trying 192.168.1.254...
* TCP_NODELAY set
* Connected to nas.example.org (192.168.1.254) port 8891 (#1)
* Server auth using Digest with user 'dba'
> PUT /sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy HTTP/1.1
> Host: nas.example.org:8891
> Authorization: Digest username="dba", realm="SPARQL", nonce="1d7593ec87590a610cae1dbc0cd4578c", uri="/sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy", cnonce="M2EyMjI1YTg5MGZlZTRkNWYzNzJlZDM1NDExNGIwMTU=", nc=00000001, qop=auth, response="b06fc97865af70fbf80e79dbc5072c61", opaque="5ebe2294ecd0e0f08eab7690d2a6ee69", algorithm="MD5"
> User-Agent: curl/7.58.0
> Accept: */*
> Transfer-Encoding: chunked
> content-type:application/n-triples
> Expect: 100-continue
>
< HTTP/1.1 201 Created
< Server: Virtuoso/07.20.3215 (Linux) x86_64-pc-linux-gnu
< Connection: Keep-Alive
< Content-Type: text/html; charset=UTF-8
< Date: Sun, 16 Sep 2018 11:19:01 GMT
< Accept-Ranges: bytes
< Content-Length: 0
<
* Connection #1 to host nas.example.org left intact
cat: Schreibfehler: Datenübergabe unterbrochen (broken pipe)

Working example with curl file input:

curl -v --digest --user dba:dba --request PUT --header "content-type:application/n-triples" --get "http://nas.example.org:8891/sparql-graph-crud-auth" --data-urlencode "graph=http://dark-horse.example.org/graph/energy" --upload-file energy-cube.nt

*   Trying 192.168.1.254...
* TCP_NODELAY set
* Connected to nas.example.org (192.168.1.254) port 8891 (#0)
* Server auth using Digest with user 'dba'
> PUT /sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy HTTP/1.1
> Host: nas.example.org:8891
> User-Agent: curl/7.58.0
> Accept: */*
> content-type:application/n-triples
> Content-Length: 0
> 
< HTTP/1.1 401 Unauthorized
< Server: Virtuoso/07.20.3215 (Linux) x86_64-pc-linux-gnu  
< Connection: close
< Content-Type: text/html; charset=UTF-8
< Date: Sun, 16 Sep 2018 11:20:00 GMT
< Accept-Ranges: bytes
< WWW-Authenticate: Digest realm="SPARQL", domain="/sparql-graph-crud-auth", nonce="627a5be772185fa7f69249f7b44aa1ac", opaque="5ebe2294ecd0e0f08eab7690d2a6ee69", stale="false", qop="auth", algorithm="MD5"
< Content-Length: 0
< 
* Closing connection 0
* Issue another request to this URL: 'http://nas.example.org:8891/sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy'
* Hostname nas.example.org was found in DNS cache
*   Trying 192.168.1.254...
* TCP_NODELAY set
* Connected to nas.example.org (192.168.1.254) port 8891 (#1)
* Server auth using Digest with user 'dba'
> PUT /sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy HTTP/1.1
> Host: nas.example.org:8891
> Authorization: Digest username="dba", realm="SPARQL", nonce="627a5be772185fa7f69249f7b44aa1ac", uri="/sparql-graph-crud-auth?graph=http%3A%2F%2Fdark-horse.example.org%2Fgraph%2Fenergy", cnonce="OWU1OWYwY2FjY2RjMWJiNWUxZWQ4NWIzYWRlM2NkNzU=", nc=00000001, qop=auth, response="eb19b99562176e828217b89e6dbd136d", opaque="5ebe2294ecd0e0f08eab7690d2a6ee69", algorithm="MD5"
> User-Agent: curl/7.58.0
> Accept: */*
> content-type:application/n-triples
> Content-Length: 19273583
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 201 Created
< Server: Virtuoso/07.20.3215 (Linux) x86_64-pc-linux-gnu  
< Connection: Keep-Alive
< Content-Type: text/html; charset=UTF-8
< Date: Sun, 16 Sep 2018 11:20:01 GMT
< Accept-Ranges: bytes
< Content-Length: 0
< 
* Connection #1 to host nas.example.org left intact
TallTed commented 6 years ago

First thing I notice is that you're running a rather elderly Virtuoso 07.20.3215 (from before 2016-05-20).

Can you test this scenario with a build from the latest /stable/7 (7.2.5.1 a/k/a 7.20.3229) or current /develop/7 (7.2.6-dev a/k/a 7.20.3229+) branches? Updating to one of these versions is strongly recommended in any case.

roelj commented 6 years ago

The exact same problem occurs with the 7.2.5.1 release as described in issue #764 . So, when specifying Transfer-Encoding: chunked, Virtuoso returns 201 Created without adding the data.

bergos commented 6 years ago

I tried again with the 7.2.5.1 docker image and also made my own image based on commit 120e9021074b526da434b827cfab9d57189c0a64. Same result for both cases.