Closed msrb closed 6 years ago
Branch with my WIP changes: https://github.com/msrb/fabric8-analytics-worker/tree/go-support-init?files=1 Known missing pieces:
Cc @tuxdna :wink:
gofedlib needs to be added to the worker base image
Here is the gofedlib with documentation. You can find how it was used in worker history - git commits aa4cf9b8472, f8089bb2. For usage in mercator task you will need to check internal non-squashed project (see workers/mercator.py) before we went open source.
Made a few experiments with gofedlib; and along the way created issues / commits to that project
Made a few experiments with gofedlib; and along the way created issues / commits to that project
- gofed/gofedlib#10
- gofed/gofedlib#14
- gofed/gofedlib#16
I think we should be fine to run gofed with Python 2 as we did previously. No need to port it to Python3 for now.
Add a couple more issues to gofedlib and a PR.
Add a couple more issues to gofedlib and a PR.
Let's incorporate golang support to our core workers using Python2 (for gofed) for now (gofed is just one piece in metadata task). Any code related fixes in gofed are not part of this card and could be reported&fixed later.
@fridex Porting not porting gofedlib
to Python3 is out of scope of the discussion for this issue.
Can you please make a release? - https://github.com/gofed/gofedlib/issues/20
@fridex Porting not porting gofedlib to Python3 is out of scope of the discussion for this issue.
I think it worth to discuss as it will require significant engineering time to accomplish and we are not dependent on this work. It's a third party library we are using and we can still invoke it using python 2 interpreter (we did the same for example for brewutils) and we already did it in the previous go support implementation. There is no need for us to invest engineering time to port this library - especially not for initial golang support we need.
Can you please make a release? - gofed/gofedlib#20
Will do, you can use master if that is blocking you:
pip install git+https://github.com/gofed/gofedlib.git
gofedlib
works with Python2 at the moment ( after a few fixes that I made ), which is sufficient to extract dependencies.
gofedlib
still may fail due to some issues, for example here is one: https://github.com/gofed/gofedlib/issues/17
Issue encountered with server API https://github.com/fabric8-analytics/fabric8-analytics-server/issues/154
We can in parallel work on handler for gathering packages available in ecosystem. Relavant: https://github.com/openshiftio/openshift.io/issues/952
If we just follow a package name to be github.com/golang/protobuf/proto
and version to be 1.0
, we can query the analyses API like below:
In this scenario, the analyses API fails to accept this package analyses:
{
error: "Cannot match given query to any API v1 endpoint"
}
The reason of failure being that, package name shouldn't contain slashes.
If porting of gofedlib
to Python 3 is not trivial, then we can run it with Python 2 for now and just open an issue so we don't forget. It would be nice to have it running on Python 3, but it's not critical (Red Hat will maintain Python 2 in RHEL for another 7+ years, so we are good on this front).
@tuxdna
The reason of failure being that, package name shouldn't contain slashes.
I think it should be possible to use URL-encoded slashes in package names.
Note I wouldn't worry about various package managers in Go as it seems that people use them to vendor dependencies in their Go applications. Initial Go ingestion is about Go libraries.
I think it should be possible to use URL-encoded slashes in package names.
I think it is not possible right now on server API (as package names are not decoded), but on jobs it naturally should work as it is passed in JSON.
Encountered issue while installing gofedlib
in container image - https://paste.fedoraproject.org/paste/95qLrmrl1MHxv4~1PkAufw
Looking for a workaround.
Adding gofedlib-cli to the worker image proceeds further with following error:
worker-ingestion_1 | ame": "metadata", "event": "TASK_START", "queue": "saleem_ingestion_MercatorTask_v0", "node_args": {"name": "github.com/golang/dep", "force_graph_sync": true, "_audit": {"ended_at": "2017-10-03T12:50:09.487150", "version": "v1", "started_at": "2017-10-03T12:49:45.134295"}, "document_id": 1, "version": "v0.1.0", "force": false, "ecosystem": "go", "_release": "go:github.com/golang/dep:v0.1.0"}, "task_id": "9179f21b-10c1-42dd-bdda-47ad6478a9b5", "flow_name": "bayesianAnalysisFlow", "dispatcher_id": "92d501ff-512d-48cb-92da-5d6b98f8411a", "parent": {}}
worker-ingestion_1 | 03 12:51:28,455 [DEBUG] f8a_worker.object_cache: Retrieving object 'go/github.com/golang/dep/v0.1.0/v0.1.0.tar.gz' from bucket 'saleem-bayesian-core-temp-artifacts' to '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/v0.1.0.tar.gz'
worker-ingestion_1 | 03 12:51:28,472 [DEBUG] f8a_worker.utils: running command ['tar', 'xf', '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/v0.1.0.tar.gz', '-C', '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/extracted_package']
worker-ingestion_1 | 03 12:51:28,517 [DEBUG] f8a_worker.utils: running command '['gofedlib-cli', '--dependencies-main', '--dependencies-packages', '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/extracted_package']'; timeout '300'
worker-ingestion_1 | 03 12:51:29,335 [ERROR] MercatorTask: Traceback (most recent call last):
worker-ingestion_1 | File "/usr/bin/gofedlib-cli", line 109, in <module>
worker-ingestion_1 | result.update(get_dependencies(project_packages(args.path), selection))
worker-ingestion_1 | File "/usr/lib/python2.7/site-packages/gofedlib/go/functions.py", line 8, in project_packages
worker-ingestion_1 | return GoSymbolsExtractor(source_code_directory).extract().packages()
worker-ingestion_1 | File "/usr/lib/python2.7/site-packages/gofedlib/go/symbolsextractor/extractor.py", line 321, in extract
worker-ingestion_1 | raise ExtractionError(err_msg)
worker-ingestion_1 | lib.types.ExtractionError: directory /internal/gps/_testdata/src/twopkgs contains definition of more packages, i.e. m1p
worker-ingestion_1 |
worker-ingestion_1 | 03 12:51:29,336 [DEBUG] f8a_worker.object_cache: Removing cached files for go/github.com/golang/dep/v0.1.0
A fix is required in gofedlib.
One more issue (among other the issues listed above) to resolve for full analysis flow of a golang package:
Hit the analysis using: http://localhost:32000/api/v1/component-analyses/go/github.com%2Fstretchr%2Ftestify/master
Here is the error in Graph import: https://paste.fedoraproject.org/paste/0VO2lQgIO3lhhLSb8T-MdQ
Looking into it now.
On resolving the above error, two additional errors are are happening:
selinon.errors.FlowError: {"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}
ERROR:data_importer:import_epv() failed with error: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
Both of these error appear to be related
worker-ingestion_1 | : "saleem_ingestion_bayesianAnalysisFlow_v0", "will_retry": false, "selective": false, "flow_name": "bayesianAnalysisFlow", "event": "FLOW_FAILURE", "node_args": {"ecosystem": "go", "force": false, "force_graph_sync": true, "_audit": {"started_at": "2017-10-04T14:37:10.447877", "ended_at": "2017-10-04T14:37:17.373681", "version": "v1"}, "version": "master", "name": "github.com/stretchr/testify", "document_id": 1, "_release": "go:github.com/stretchr/testify:master"}, "retry": 15, "parent": null, "state": {"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}, "dispatcher_id": "aa06cb37-1c94-4c3a-a38f-62834fa4753b"}0-04 14:38:31,378 [ERROR] celery.app.trace: Task selinon.Dispatcher[aa06cb37-1c94-4c3a-a38f-62834fa4753b] raised unexpected: FlowError('{"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}',)
worker-ingestion_1 | Traceback (most recent call last):
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/celery/app/trace.py", line 367, in trace_task
worker-ingestion_1 | R = retval = fun(*args, **kwargs)
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/celery/app/trace.py", line 622, in __protected_call__
worker-ingestion_1 | return self.run(*args, **kwargs)
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/selinon/dispatcher.py", line 103, in run
worker-ingestion_1 | raise self.retry(max_retries=0, exc=flow_error)
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/celery/app/task.py", line 668, in retry
worker-ingestion_1 | raise_with_context(exc)
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/selinon/dispatcher.py", line 83, in run
worker-ingestion_1 | retry = system_state.update()
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/selinon/systemState.py", line 760, in update
worker-ingestion_1 | started, reused, fallback_started = self._continue_and_update_retry([])
worker-ingestion_1 | File "/usr/lib/python3.4/site-packages/selinon/systemState.py", line 745, in _continue_and_update_retry
worker-ingestion_1 | raise FlowError(json.dumps(state_info))
worker-ingestion_1 | selinon.errors.FlowError: {"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}
data-model-importer_1 | --------------------------------------------------------------------------------
data-model-importer_1 | INFO in rest_api [/src/rest_api.py:48]:
data-model-importer_1 | Ingesting the given list of EPVs - [{"ecosystem": "go", "version": "master", "name": "github.com/stretchr/testify"}]
data-model-importer_1 | --------------------------------------------------------------------------------
data-model-importer_1 | ERROR:data_importer:import_epv() failed with error: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
worker-ingestion_1 | : "TASK_FAILURE", "parent": {"ResultCollector": "b2768add-e02a-4ec8-8f61-7325e875801b"}, "task_name": "GraphImporterTask", "queue": "saleem_ingestion_GraphImporterTask_v0", "node_args": {"version": "master", "document_id": 1, "force_graph_sync": true, "name": "github.com/stretchr/testify", "_audit": {"version": "v1", "started_at": "2017-10-04T15:02:10.869118", "ended_at": "2017-10-04T15:02:18.546199"}, "_release": "go:github.com/stretchr/testify:master", "force": false, "ecosystem": "go"}, "flow_name": "bayesianFlow", "retried_count": 0, "what": "Traceback (most recent call last):\n File \"/usr/lib/python3.4/site-packages/selinon/selinonTaskEnvelope.py\", line 115, in run\n result = task.run(node_args)\n File \"/usr/lib/python3.4/site-packages/f8a_worker/base.py\", line 39, in run\n result = self.execute(node_args)\n File \"/usr/lib/python3.4/site-packages/f8a_worker/workers/graph_importer.py\", line 49, in execute\n raise RuntimeError(\"Failed to invoke graph import at '%s' for %s\" % (endpoint, param))\nRuntimeError: Failed to invoke graph import at 'http://data-model-importer:9192/api/v1/ingest_to_graph' for [{'version': 'master', 'name': 'github.com/stretchr/testify', 'ecosystem': 'go'}]\n", "task_id": "1a12a52d-971f-4c04-abdd-dd8c675d7d9c", "dispatcher_id": "c5cce0af-f075-4dfa-91c0-91516e7bc4cc"}
data-model-importer_1 | ERROR:data_importer:Traceback for latest failure in import call: Traceback (most recent call last):
data-model-importer_1 | File "/src/data_importer.py", line 149, in import_epv_http
data-model-importer_1 | prefix=pkg_key_prefix))
data-model-importer_1 | File "/src/data_source/s3_data_source.py", line 62, in list_files
data-model-importer_1 | for obj in bucket.objects.filter(Prefix=prefix):
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/boto3/resources/collection.py", line 83, in __iter__
data-model-importer_1 | for page in self.pages():
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/boto3/resources/collection.py", line 166, in pages
data-model-importer_1 | for page in pages:
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 249, in __iter__
data-model-importer_1 | response = self._make_request(current_kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 326, in _make_request
data-model-importer_1 | return self._method(**current_kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/botocore/client.py", line 312, in _api_call
data-model-importer_1 | return self._make_api_call(operation_name, kwargs)
data-model-importer_1 | File "/usr/lib/python2.7/site-packages/botocore/client.py", line 601, in _make_api_call
data-model-importer_1 | raise error_class(parsed_response, operation_name)
data-model-importer_1 | NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
data-model-importer_1 |
This will fix the second issue mentioned above:
BTW some import paths can lead to different git repos. This list could help to map import path to provider prefix when analysing projects.
Golang support is now available in production. Thanks everyone for help and suggestions!
nice!
Description
We want to be able to analyze golang packages in fabric8-analytics. The initial step is to implement support for golang on component analysis level and store the data in S3.
Note: We are not starting this task from scratch, we already have some code written, but it needs to be rebased, tested, and missing functionality implemented. Talk to @msrb.
Acceptance criteria