Closed msrb closed 6 years ago
Fixed a typo @msrb :)
@sivaavkd description updated, is it better now?
Carrying forward from - https://github.com/openshiftio/openshift.io/issues/1085#issuecomment-361237441
Some intermittent failures encountered in Graph layer. This seems to be happening very sparsely:
g.V().has('ecosystem','maven').has('name','org.apache.tomcat:tomcat-servlet-api').properties('tokens','libio_usedby').drop().iterate();pkg = g.V().has('ecosystem','maven').has('name', 'org.apache.tomcat:tomcat-servlet-api').tryNext().orElseGet{graph.addVertex('ecosystem', 'maven', 'name', 'org.apache.tomcat:tomcat-servlet-api', 'vertex_label', 'Package')};pkg.property('last_updated', 1517286212.43);pkg.property('tokens', 'org'); pkg.property('tokens', 'apache'); pkg.property('tokens', 'tomcat'); pkg.property('tokens', 'tomcat'); pkg.property('tokens', 'servlet'); pkg.property('tokens', 'api');pkg.property('latest_version', '9.0.0.M17');pkg.property('libio_latest_release', '1500854400.0');pkg.property('libio_usedby', 'keycloak/keycloak:1359');pkg.property('libio_usedby', 'SungardAS/enhanced-snapshots:35');pkg.property('libio_usedby', 'cf-unik/unik:1239');pkg.property('libio_usedby', 'entando/entando-components:15');pkg.property('libio_usedby', 'indeedeng/proctor:198');pkg.property('libio_usedby', 'magro/memcached-session-manager:552');pkg.property('libio_usedby', 'nysenate/OpenLegislation:163');pkg.property('libio_usedby', 'Red5/red5-server:1254');pkg.property('libio_usedby', 'aspose-words/Aspose.Words-for-Java:52');pkg.property('libio_usedby', 'google/identity-toolkit-java-client:32');pkg.property('libio_dependents_projects', '65');pkg.property('libio_dependents_repos', '2.05K');pkg.property('libio_total_releases', '124');pkg.property('libio_latest_version', '8.5.19');g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.19').property('gh_release_date', 1500854400.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M25').property('gh_release_date',1500854400.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M19').property('gh_release_date',1490572800.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.
5.16').property('gh_release_date',1498003200.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.14').property('gh_release_date',1492041600.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.15').property('gh_release_date',1493942400.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M21').property('gh_release_date',1493856000.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M20').property('gh_release_date',1491955200.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M22').property('gh_release_date',1498003200.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.13').property('gh_release_date',1490572800.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M1').properties('licenses','cve_ids','declared_licenses').drop().iterate();ver = g.V().has('pecosystem', 'maven').has('pname', 'org.apache.tomcat:tomcat-servlet-api').has('version', '9.0.0.M1').tryNext().orElseGet{graph.addVertex('pecosystem','maven', 'pname','org.apache.tomcat:tomcat-servlet-api', 'version', '9.0.0.M1', 'vertex_label', 'Version')};ver.property('last_updated',1517286212.43);ver.property('description','javax.servlet package');ver.property('cm_num_files',112);ver.property('cm_avg_cyclomatic_complexity', 1.23);ver.property('cm_loc',42622);ver.property('licenses', 'ASL 2.0'); ver.property('licenses', 'CDDL');ver.property('cve_ids', 'CVE-2017-6056:5.0'); ver.property('cve_ids', 'CVE-2016-8735:7.5'); ver.property('cve_ids', 'CVE-2016-6816:6.8'); ver.property('cve_ids', 'CVE-2016-6325:7.2'); ver.property('cve_ids', 'CVE-2016-5425:7.2'); ver.property('cve_ids', 'CVE-2016-3092:7.8'); ver.property('cve_ids', 'CVE-2016-0763:6.5'); ver.property('cve_ids', 'CVE-2016-0714:6.
5'); ver.property('cve_ids', 'CVE-2016-0706:4.0'); ver.property('cve_ids', 'CVE-2015-5351:6.8'); ver.property('cve_ids', 'CVE-2015-5346:6.8'); ver.property('cve_ids', 'CVE-2015-5345:5.0');ver.property('declared_licenses', 'Apache License'); ver.property('declared_licenses', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0');lic = g.V().has('lname', 'Apache License').tryNext().orElseGet{graph.addVertex('vertex_label', 'License', 'lname', 'Apache License', 'last_updated',1517286212.43)}; g.V(ver).out('has_declared_license').has('lname', 'Apache License').tryNext().orElseGet{ver.addEdge('has_declared_license', lic)};lic = g.V().has('lname', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0').tryNext().orElseGet{graph.addVertex('vertex_label', 'License', 'lname', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0', 'last_updated',1517286212.43)}; g.V(ver).out('has_declared_license').has('lname', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0').tryNext().orElseGet{ver.addEdge('has_declared_license', lic)};edge_c = g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M1').in('has_version').tryNext().orElseGet{pkg.addEdge('has_version', ver)};
ERROR:data_importer:The import failed: 'status'
ERROR:data_importer:Traceback for latest failure in import call: Traceback (most recent call last):
File "/src/data_importer.py", line 101, in _import_keys_from_s3_http
if resp['status']['code'] == 200:
KeyError: 'status'
g.V().has('ecosystem','maven').has('name','org.apache.tomcat:tomcat-servlet-api').properties('tokens','libio_usedby').drop().iterate();pkg = g.V().has('ecosystem','maven').has('name', 'org.apache.tomcat:tomcat-servlet-api').tryNext().orElseGet{graph.addVertex('ecosystem', 'maven', 'name', 'org.apache.tomcat:tomcat-servlet-api', 'vertex_label', 'Package')};pkg.property('last_updated', 1517286213.63);pkg.property('tokens', 'org'); pkg.property('tokens', 'apache'); pkg.property('tokens', 'tomcat'); pkg.property('tokens', 'tomcat'); pkg.property('tokens', 'servlet'); pkg.property('tokens', 'api');pkg.property('latest_version', '9.0.0.M17');pkg.property('libio_latest_release', '1500854400.0');pkg.property('libio_usedby', 'keycloak/keycloak:1359');pkg.property('libio_usedby', 'SungardAS/enhanced-snapshots:35');pkg.property('libio_usedby', 'cf-unik/unik:1239');pkg.property('libio_usedby', 'entando/entando-components:15');pkg.property('libio_usedby', 'indeedeng/proctor:198');pkg.property('libio_usedby', 'magro/memcached-session-manager:552');pkg.property('libio_usedby', 'nysenate/OpenLegislation:163');pkg.property('libio_usedby', 'Red5/red5-server:1254');pkg.property('libio_usedby', 'aspose-words/Aspose.Words-for-Java:52');pkg.property('libio_usedby', 'google/identity-toolkit-java-client:32');pkg.property('libio_dependents_projects', '65');pkg.property('libio_dependents_repos', '2.05K');pkg.property('libio_total_releases', '124');pkg.property('libio_latest_version', '8.5.19');g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.19').property('gh_release_date', 1500854400.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M25').property('gh_release_date',1500854400.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M19').property('gh_release_date',1490572800.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.
5.16').property('gh_release_date',1498003200.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.14').property('gh_release_date',1492041600.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.15').property('gh_release_date',1493942400.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M21').property('gh_release_date',1493856000.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M20').property('gh_release_date',1491955200.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M22').property('gh_release_date',1498003200.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.13').property('gh_release_date',1490572800.0);g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M11').properties('licenses','cve_ids','declared_licenses').drop().iterate();ver = g.V().has('pecosystem', 'maven').has('pname', 'org.apache.tomcat:tomcat-servlet-api').has('version', '9.0.0.M11').tryNext().orElseGet{graph.addVertex('pecosystem','maven', 'pname','org.apache.tomcat:tomcat-servlet-api', 'version', '9.0.0.M11', 'vertex_label', 'Version')};ver.property('last_updated',1517286213.63);ver.property('description','javax.servlet package');ver.property('cm_num_files',114);ver.property('cm_avg_cyclomatic_complexity', 1.23);ver.property('cm_loc',42800);ver.property('licenses', 'ASL 2.0'); ver.property('licenses', 'CDDL');ver.property('cve_ids', 'CVE-2017-6056:5.0'); ver.property('cve_ids', 'CVE-2016-8747:5.0'); ver.property('cve_ids', 'CVE-2016-8735:7.5'); ver.property('cve_ids', 'CVE-2016-6816:6.8'); ver.property('cve_ids', 'CVE-2016-6325:7.2'); ver.property('cve_ids', 'CVE-2016-5425:7.2');ver.property('declared_licenses', 'Apache License'); ver.property('declared_licenses'
, ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0');lic = g.V().has('lname', 'Apache License').tryNext().orElseGet{graph.addVertex('vertex_label', 'License', 'lname', 'Apache License', 'last_updated',1517286213.63)}; g.V(ver).out('has_declared_license').has('lname', 'Apache License').tryNext().orElseGet{ver.addEdge('has_declared_license', lic)};lic = g.V().has('lname', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0').tryNext().orElseGet{graph.addVertex('vertex_label', 'License', 'lname', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0', 'last_updated',1517286213.63)}; g.V(ver).out('has_declared_license').has('lname', ' Version 2.0 and
Common Development And Distribution License (CDDL) Version 1.0').tryNext().orElseGet{ver.addEdge('has_declared_license', lic)};edge_c = g.V().has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','9.0.0.M11').in('has_version').tryNext().orElseGet{pkg.addEdge('has_version', ver)};
ERROR:data_importer:The import failed: HTTPConnectionPool(host='172.30.80.86', port=8182): Read timed out. (read timeout=30)
ERROR:data_importer:Traceback for latest failure in import call: Traceback (most recent call last):
File "/src/data_importer.py", line 98, in _import_keys_from_s3_http
data=json.dumps(payload), timeout=30)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 521, in send
raise ReadTimeout(e, request=request)
ReadTimeout: HTTPConnectionPool(host='172.30.80.86', port=8182): Read timed out. (read timeout=30)
Figured out the cause of above error from gremlin server logs:
39989555 [gremlin-server-worker-1] WARN org.apache.tinkerpop.gremlin.server.handler.HttpGremlinEndpointHandler - Invalid request - responding with 500 Internal Server Error and startup failed:
Script3023.groovy: 1: expecting ''', found '\n' @ line 1, column 4116.
d_licenses', ' Version 2.0 and
^
1 error
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script3023.groovy: 1: expecting ''', found '\n' @ line 1, column 4116.
d_licenses', ' Version 2.0 and
^
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:310)
at org.codehaus.groovy.control.ErrorCollector.addFatalError(ErrorCollector.java:150)
at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:120)
at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:132)
at org.codehaus.groovy.control.SourceUnit.addError(SourceUnit.java:360)
at org.codehaus.groovy.antlr.AntlrParserPlugin.transformCSTIntoAST(AntlrParserPlugin.java:140)
at org.codehaus.groovy.antlr.AntlrParserPlugin.parseCST(AntlrParserPlugin.java:111)
at org.codehaus.groovy.control.SourceUnit.parse(SourceUnit.java:237)
at org.codehaus.groovy.control.CompilationUnit$1.call(CompilationUnit.java:167)
at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:931)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:593)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:569)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:546)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:254)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:211)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.getScriptClass(GremlinGroovyScriptEngine.java:527)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:446)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines.eval(ScriptEngines.java:119)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$2(GremlinExecutor.java:287)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Essentially the queries are not formed should factor in special characters like newlines correctly in existing code here:
drop_props.append('declared_licenses')
prp_version += " ".join(["ver.property('declared_licenses', '{}');".format
(dl) for dl in declared_licenses])
# Create License Node and edge from EPV
for lic in declared_licenses:
prp_version += "lic = g.V().has('lname', '{lic}').tryNext().orElseGet{{" \
"graph.addVertex('vertex_label', 'License', 'lname', '{lic}', " \
"'last_updated',{last_updated})}}; g.V(ver).out(" \
"'has_declared_license').has('lname', '{lic}').tryNext()." \
"orElseGet{{ver.addEdge('has_declared_license', lic)}};".format(
lic=lic, last_updated=str(time.time())
)
Happens for this package
has('pecosystem','maven').has('pname','org.apache.tomcat:tomcat-servlet-api').has('version','8.5.14')
metadata.json
for the above EPV contains
"declared_license": "Apache License, Version 2.0 and\n Common Development And Distribution License (CDDL) Version 1.0",
Notice the newline.
Fixed issue -
Another issues encountered
Last time I checked the progress of Maven graph sync, it went till package named org.ops4j.pax.exam:pax-exam-spi
which is ranked 104495
out of total 131460
Maven packages ( considering lexicographic order by name ). This is about 79% of all Maven packages which were synced.
@miteshvp can we query graph for exact numbers of Maven packages/components please?
In the first pass, less than 11402
packages remain from Maven graph sync. This is
100 * (1 - 11402 / 131460) = 91.32 %
There could be some packages skipped due to Gateway Timeouts. We can sync those once first pass is complete.
First pass is complete. Many of the Maven packages failed to sync in graph due to multiple reasons:
I have scheduled the sync of those (pending) packages again now.
Thanks Saleem 👍
I've created https://github.com/openshiftio/openshift.io/issues/2256 for improving test coverage in data-importer.
Late reply but here it is - there are total 137389
Maven packages in our graph
@miteshvp How did you figure this out ?
Maven is in graph, closing. Thanks @tuxdna :wink:
Description
There were times when data ingestion pipeline was broken or certain parts of the pipeline were disabled. During such times, we analyzed plenty of packages and stored results in S3, but never ingested the data to graph database. With https://github.com/openshiftio/openshift.io/issues/1085 implemented, we want to sync all missing data from S3 to graph.
Acceptance criteria