tetherless-world / opendap

Provenance trace and pingback services for OPeNDAP using Prizms.
http://opendap.tw.rpi.edu
1 stars 1 forks source link

OPeNDAP request -> Prizms cron delay #32

Open timrdf opened 10 years ago

timrdf commented 10 years ago

How to tighten the response? https://github.com/tetherless-world/opendap/wiki/Use-case:-mockup-tracer#wiki-processing-data-from-opendap-using-http

Unfortunately, there will be a delay between the time that OPeNDAP reports the "has_provenance" and "pingback" URLs, and the time that they are available for request. This is because Prizms uses cron and is not event based. As a stopgap, we'll try to tighten up Prizms' cron so that we can rerun it more regularly than it's current nightly. We'd be happy to hear any suggestions you may have for how to address this current technological limitation.

timrdf commented 10 years ago

Cron currently takes 1.5 hours with cr-retrieve.sh taking 1:15, cr-full-dump taking 10 minutes, and linksets taking 5.

cron-2014-Jan-28_22_57.log:

"2014-01-28T22:57:01+00:00"^^xsd:dateTime <#git-pull>
"2014-01-28T22:57:01+00:00"^^xsd:dateTime <#cr-mirror-ckan>
"2014-01-28T22:57:01+00:00"^^xsd:dateTime <#cr-retrieve>
"2014-01-29T00:12:38+00:00"^^xsd:dateTime <#cr-publish>
"2014-01-29T00:14:16+00:00"^^xsd:dateTime <#cr-full-dump>
"2014-01-29T00:24:39+00:00"^^xsd:dateTime <#cr-linksets>
"2014-01-29T00:28:12+00:00"^^xsd:dateTime <#cr-pingback>
timrdf commented 10 years ago

https://github.com/timrdf/csv2rdf4lod-automation/issues/313 could be revived to get the retrieval TIC PROV.

timrdf commented 10 years ago

1.5 hours again:

BEGIN cron ps --user prizms Fri Jan 31 16:21:01 UTC 2014
END cron Fri Jan 31 18:56:13 UTC 2014
timrdf commented 10 years ago

tic's PROV shows that /retrieval/us/pr-spobal-ng is the 1.5 hour culprit.

cr-latest-logs.sh  | xargs tic.sh

@base <5aa98d9812f3ae4adce9fde3183fbb4d/doc/logs/cron-2014-Jan-31_18_58.log> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
...

<#cr-full-dump>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T21:36:34+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<#cr-linksets>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T21:48:47+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<#cr-mirror-ckan>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T18:58:02+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<#cr-pingback>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T21:52:22+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<#cr-publish>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T21:36:11+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<#cr-retrieve>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T18:58:02+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<#cron>
    sio:software-process-identifier "20596" ;
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T21:52:22+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:58:01+00:00"^^xsd:dateTime .

<#git-pull>
    a prov:Activity ;
    prov:startedAtTime "2014-01-31T18:58:01+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cron> .

<../../../retrieval/opendap-org/opendap/svn>
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T18:58:05+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:58:05+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cr-retrieve> .

<../../../retrieval/opendap-org/statsvn/2013-Dec-22>
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T18:58:07+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:58:07+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cr-retrieve> .

<../../../retrieval/us/cr-isdefinedby>
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T18:59:18+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:58:54+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cr-retrieve> .

<../../../retrieval/us/opendap-prov>
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T18:58:27+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:58:26+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cr-retrieve> .

<../../../retrieval/us/pr-aggregate-pingbacks>
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T18:58:51+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:58:32+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cr-retrieve> .

<../../../retrieval/us/pr-spobal-ng>
    a prov:Activity ;
    prov:endedAtTime "2014-01-31T21:36:04+00:00"^^xsd:dateTime ;
    prov:startedAtTime "2014-01-31T18:59:34+00:00"^^xsd:dateTime ;
    prov:wasInformedBy <#cr-retrieve> .
timrdf commented 10 years ago

A bug in csv2rdf4lod's NameFactory.java was returning the source-id of the sparql endpoint, instead of the ugly URI for the named graph. Fixed and now summarizing the 132 ngs that we're behind on.

timrdf commented 10 years ago

Entire cron is down to 20 minutes. pr-spobal-ng is 5 of it.