qudt / qudt-public-repo

QUDT -Quantities, Units, Dimensions and dataTypes - public repository
Other
112 stars 71 forks source link

RDF responses from server should use correct Media Type #55

Open nicholascar opened 4 years ago

nicholascar commented 4 years ago

When I resolve the URI http://qudt.org/2.1/schema/qudt/, I receive an RDF (turtle) file but the Media Type given in the Content-Type header is text/plain but it should be text/turtle. This makes the task of parsing harder for automated clients (like cURL scripts).

steveraysteveray commented 4 years ago

I agree. My problem is that I don't understand apache well enough to get this to work right. Can you point me to some reference that will tell me what I need to do?

nicholascar commented 4 years ago

You just need to add a directive in any of the Apache config files that are being called by the server to associate a particular file ending with a Media Type.

Using .ttl for a Turtle file, as you are, add this:

AddType text/turtle .ttl

This will just tell Apache to include this in the Content-Type header for files with that extension.

Add this to either apache.conf or any domain-specific VHost file, such as qudt.org.conf, if you've set that up, or else the default VHost which, in Apache 2.4, is 000-default.conf. In Debain/Ubuntu systems, the VHost files are in the /etc/apaches/sites-enabled/ folder.

steveraysteveray commented 4 years ago

Thanks for the key information I needed! I think it works now. Please confirm.

nicholascar commented 4 years ago

Hmmm... not quite, when I do this:

curl -iI http://qudt.org/2.1/schema/qudt/

I still get this:

HTTP/1.1 200 OK
Server: openresty/1.15.8.1
Date: Tue, 25 Feb 2020 01:17:54 GMT
Content-Type: text/plain
Content-Length: 116113
Connection: keep-alive
Vary: Accept-Encoding
Last-Modified: Mon, 24 Feb 2020 19:31:14 GMT
ETag: "16c467eb-1c591-59f576a15eb90"
X-Webcom-Cache-Status: BYPASS
Accept-Ranges: bytes

so it still says just text/plain.

You might have to do some server restart or something to enable?

steveraysteveray commented 4 years ago

OK, try it now. Pretty sure it works. But only because of an unusual situation: The http://qudt.org/2.1/schema/qudt/ resolves to a directory on our server, and in that directory, I have a .htaccess file that redirects an index request to a file named _all.ttl. (It used to redirect to _all). Once I renamed the file with a .ttl extension, I think the Apache server now recognizes it and adds the right header.

Problem is, most of the other URIs are not directory names (e.g. http://qudt.org/2.1/vocab/unit). These are just files with names like "unit" sitting in a folder named vocab. I really don't want to put each of these in their own folder with a .htaccess redirect, like I did with schema/qudt. And yet, I want the URI that people ask for to be one without a .ttl extension. Do you have a suggestion?

jhodgesatmb commented 4 years ago

Personal opinion: the .htaccess file shouldn’t require a terminator to know what type a file is. I assume it doesn’t work that way now but on a world of hypersensitivity to cyber attack and security you would think the web infrastructure would be content based.

Jack Hodges, Ph.D. Arbor Studios

On Feb 24, 2020, at 5:19 PM, Nicholas Car notifications@github.com wrote:

 Hmmm... not quite, when I do this:

curl -iI http://qudt.org/2.1/schema/qudt/ I still get this:

HTTP/1.1 200 OK Server: openresty/1.15.8.1 Date: Tue, 25 Feb 2020 01:17:54 GMT Content-Type: text/plain Content-Length: 116113 Connection: keep-alive Vary: Accept-Encoding Last-Modified: Mon, 24 Feb 2020 19:31:14 GMT ETag: "16c467eb-1c591-59f576a15eb90" X-Webcom-Cache-Status: BYPASS Accept-Ranges: bytes so it still says just text/plain.

You might have to do some server restart or something to enable?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

nicholascar commented 4 years ago

Yes, working now:

curl -iI http://qudt.org/2.1/schema/qudt/

gets

HTTP/1.1 200 OK
Server: openresty/1.13.6.2
Date: Tue, 25 Feb 2020 11:08:40 GMT
Content-Type: text/turtle
Content-Length: 116113
Connection: keep-alive
Last-Modified: Mon, 24 Feb 2020 19:31:14 GMT
ETag: "16c467eb-1c591-59f576a15eb90"
X-Webcom-Cache-Status: BYPASS
Accept-Ranges: bytes

So will close the Issue, thanks!

Problem is, most of the other URIs are not directory names

One solution could be to write a set of Apache RewriteRules that 'catch' each URI like ../../unit and redirect it to where you want to go. Then, with the results coming from a .ttl file, the header should be formed correctly.

You can do almost anything with RewriteRules, including adding in various Content-Type headers to cater for cases where you're returning RDF not from a file ending .ttl (if, indeed, you have such cases).

Just let me know if you need any more tips on RewriteRules! I maintain hundreds of them for the linked.data.gov.au Australian gov persistent ID server!

steveraysteveray commented 3 years ago

@nicholascar, in the course of migrating our web pages to updated versions of apache and tomcat (9), I had a little trouble with the handoff from apache to tomcat. As a result, I decided to just let tomcat deliver the static pages from it's own ROOT folder. That works fine, except tomcat doesn't pay attention to .htaccess, so all the nice work we did in handling dereferencing and Turtle files now doesn't work again. I have 2 options:

  1. Figure out how to get tomcat to do the things that apache was doing via .htaccess files
  2. Figure out how to get apache to play nicely with tomcat again

Can you help me on this? The reason I had trouble trying option 2 was because some of the configuration files have moved around or disappeared in the new releases, such as tomcat.conf, where I had placed the ajp directives.

nicholascar commented 3 years ago

@steveraysteveray it sounds like a bit of a namespace policy for qudt.org is needed! If we can create a table of all the required patterns, I can assist you with either a Tomcat or a Tomcat + Apache solution easily enough!

Here is an attempt to capture all motioned patterns. I machine searched this public repo as well as HTML pages of the website for all the qudt.org IRIs I could find:

qudt.org

Request Request Headers Response Response Headers
/1.1/html/nist-constants.html
       
/2.0/schema/([a-z]+)
e.g. /2.0/schema/dtype
- 404 -
/2.0/schema/(.*).ttl - 404 -
/2.0/vocab/(.*).ttl
e.g. /2.0/vocab/VOCAB_--v2.0.ttl
- 404 -
       
/2.1/schema/(.+)
e.g. /2.1/schema/datatype
- Turtle text/turtle
/2.1/vocab/(.+)
e.g. /2.1/vocab/prefix
- Turtle text/turtle
       
/community/(.*)
Loop3D only, so far
text/html or text/turtle as appropriate
       
/doc/DOC_VOCAB-DISCIPLINES.html - www equivalent
/doc/2017/DOC_VOCAB-QUDT-DISCIPLINES-v2.0.html - HTML page Content-Type text/html
/doc/2019/10/DOC_VOCAB-DISCIPLINES-v2.1.html - HTML page Content-Type text/html
       
/schema/ - HTML page Content-Type text/html
/schema/datatype/ - Turtle text/turtle
/schema/dimension/ - 404 -
/schema/dtype/ - 404 -
/schema/extensions/ - HTML page Content-Type text/html
/schema/qudt/ - Turtle text/turtle
/schema/type/ - 404 -
       
/unicode - 404 -
       
/vocab/constant/ - vocab in Turtle text/turtle
/vocab/datatype/ - 404 -
/vocab/dimensionvector - vocab in Turtle text/turtle
/vocab/isq/quantity/information-science-and-technology - 404 -
/vocab/isq/quantity/thermodynamics - 404 -
/vocab/prefix/ - vocab in Turtle text/turtle
/vocab/quantity/ - 404 -
/vocab/quantitykind/ - vocab in Turtle text/turtle
/vocab/type/ - 404 -
/vocab/type/SIGNED-LONG-INTEGER - 404 -
/vocab/type/oracle/ - 404 -
/vocab/unit/ - vocab in Turtle text/turtle

www.qudt.org

Request Request Headers Response Response Headers
/ - /pages/HomePage.html Content-Type text/html
/2.1/catalog/qudt-catalog.html - HTML page Content-Type text/html
/edg/tbl - EDG system -
/pages/*.html - HTML page Content-Type text/html
/doc/*.html - HTML page Content-Type text/html

Do you recognise all the patterns & individual IRIs listed here? Are there more you know of? Clearly some of these are deprecated!

After some consolidation of the patterns you want to support, we can work out the Tomcat / Apache split, or a Tomcat-only implementation.

steveraysteveray commented 3 years ago

I got a few differences in behavior from what you reported. For example, /2.0/schema/qudt/ works. However, for the web pages, I'm less concerned about earlier versions, and more concerned about just getting any pages at all working on the new instance. After that, we could clean up some hanging threads of deprecated addresses. So, for the record, the table below is my take on what web pages we want working.

What I propose is that you (@nicholascar) and I have a brief shared screen session where I can show you the new Amazon instance where everything else is working... EDG 7.0.3, Tomcat 9.0.50, Apache 2.4.48... What I cannot figure out is how to have http://qudt.org/ not just show the Tomcat start page.

qudt.org

Request Request Headers Response Response Headers Desired
/2.1/schema/(.+)
e.g. /2.1/schema/datatype
- Turtle text/turtle Yes
/2.1/vocab/(.+)
e.g. /2.1/vocab/prefix
- Turtle text/turtle Yes
       
/community/(.*)
Loop3D only, so far
text/html or text/turtle as appropriate Yes
       
/doc/DOC_VOCAB-DISCIPLINES.html - www equivalent No
/doc/2017/DOC_VOCAB-QUDT-DISCIPLINES-v2.0.html - HTML page Content-Type text/html No
/doc/2019/10/DOC_VOCAB-DISCIPLINES-v2.1.html - HTML page Content-Type text/html No
       
/schema/ - HTML page Content-Type text/html Yes
/schema/datatype/ - Turtle text/turtle Yes
/schema/dimension/ - 404 - Yes
/schema/dtype/ - 404 -
/schema/extensions/ - HTML page Content-Type text/html Yes
/schema/qudt/ - Turtle text/turtle Yes
/schema/type/ - 404 - No
       
/unicode - 404 - No
       
/vocab/constant/ - vocab in Turtle text/turtle Yes
/vocab/datatype/ - 404 - Eventually
/vocab/dimensionvector - vocab in Turtle text/turtle Yes
/vocab/isq/quantity/information-science-and-technology - 404 - No
/vocab/isq/quantity/thermodynamics - 404 - No
/vocab/prefix/ - vocab in Turtle text/turtle Yes
/vocab/quantity/ - 404 - No
/vocab/quantitykind/ - vocab in Turtle text/turtle Yes
/vocab/type/ - 404 - No
/vocab/type/SIGNED-LONG-INTEGER - 404 - No
/vocab/type/oracle/ - 404 - No
/vocab/unit/ - vocab in Turtle text/turtle Yes

www.qudt.org

Request Request Headers Response Response Headers Desired
/ - /pages/HomePage.html Content-Type text/html Yes
/2.1/catalog/qudt-catalog.html - HTML page Content-Type text/html Yes
/edg/tbl - EDG system - Yes
/pages/*.html - HTML page Content-Type text/html Yes
/doc/*.html - HTML page Content-Type text/html Yes