Open GoogleCodeExporter opened 9 years ago
The URI above is wrong - it refers to an old, outdated editors draft of the
HTTP-in-RDF Vocabulary.
The latest Public Working Draft is here:
http://www.w3.org/TR/HTTP-in-RDF10/
Original comment by mfhepp
on 17 Jan 2011 at 9:08
Another approach: Use provenance vocab: (Thanks to Olaf Hartig)
<snip>
What you describe seems to be exactly one of the use cases we developed the
Provenance Vocabulary [1] for:The Provenance Vocabulary provides the class
prv:DataAccess which represents the execution of a data access on the Web.
Using the property prvTypes:exchangedHTTPMessage you can associate instances
of prv:DataAccess with the HTTP messages that have been exchanged. These
HTTP messages can then be described using the W3C RDF vocabulary for HTTP.
Here's an example:
foo:DataAboutProduct1
foaf:primaryTopic foo:Product1 ;
prv:createdBy _:dc .
_:dc
a prv:DataCreation ;
# ... additional information about the creation process ...
prv:usedData _:xml .
_:xml
a prv:DataItem ;
prv:retrievedBy _:da .
_:da
a prv:DataAccess ;
prv:accessedResource <http://www.heppnetz.de/companies.xml> ;
prvTypes:exchangedHTTPMessage _:m .
_:m
a http:Response ;
http:httpVersion "1.1" ;
# ...
http:statusCodeNumber "200" .
(Needless to say that you may use URIs instead of the blank node identifiers
that I used in the example for the sake of readability.)
Our "Guide to the Provenance Vocabulary" contains another example in Section
"3.3.2 Related Vocabularies: HTTP Vocabulary in RDF" [2].
Greetings,
Olaf
[1] http://purl.org/net/provenance/
[2] http://purl.org/net/provenance/guide#HTTP_Vocabulary_in_RDF
</snip>
Original comment by mfhepp
on 18 Jan 2011 at 10:08
Basically, the code must be extended in line 400 of mainloops.py
csv.register_dialect("short_life", delimiter=self.updateM.delimiter,quotechar=self.updateM.quoted,escapechar=self.updateM.escaped)
dat2 = urllib.urlopen(datei, timeout=self.paramenter.timeout)
reader = csv.reader(dat2, "short_life")
You may have to use urllib2 instead of urlib to access the http headers, good
doc is here:
http://www.voidspace.org.uk/python/articles/urllib2.shtml
import urllib2
user_agent = 'Elmar2GoodRelations)'
headers = { 'User-Agent' : user_agent }
req = urllib2.Request(url, headers)
response = urllib2.urlopen(req)
the_page = response.read()
headers = response.info().headers
headers will then be a list with the header info:
['Date: Tue, 18 Jan 2011 11:32:01 GMT\r\n', 'Server: Apache\r\n',
'Last-Modified: Sat, 27 Nov 2010 19:51:44 GMT\r\n', 'ETag:
"193a1f07-4165-4cf16150"\r\n', 'Accept-Ranges: bytes\r\n', 'Content-Length:
16741\r\n', 'Connection: close\r\n', 'Content-Type: text/html\r\n']
but you still have to split it into field name and field value.
If you know there parameter name, you can also access it directly
content_type = response.info().getheader('Content-Type')
Original comment by mfhepp
on 18 Jan 2011 at 11:36
Original issue reported on code.google.com by
mfhepp
on 17 Jan 2011 at 2:18