openshiftio / openshift.io

Red Hat OpenShift.io is an end-to-end development environment for planning, building and deploying modern applications.
https://openshift.io
97 stars 66 forks source link

Add golang support on component analysis level #686

Closed msrb closed 6 years ago

msrb commented 7 years ago

Description

We want to be able to analyze golang packages in fabric8-analytics. The initial step is to implement support for golang on component analysis level and store the data in S3.

Note: We are not starting this task from scratch, we already have some code written, but it needs to be rebased, tested, and missing functionality implemented. Talk to @msrb.

Acceptance criteria

msrb commented 7 years ago

Branch with my WIP changes: https://github.com/msrb/fabric8-analytics-worker/tree/go-support-init?files=1 Known missing pieces:

Cc @tuxdna :wink:

fridex commented 7 years ago

gofedlib needs to be added to the worker base image

Here is the gofedlib with documentation. You can find how it was used in worker history - git commits aa4cf9b8472, f8089bb2. For usage in mercator task you will need to check internal non-squashed project (see workers/mercator.py) before we went open source.

tuxdna commented 7 years ago

Made a few experiments with gofedlib; and along the way created issues / commits to that project

fridex commented 7 years ago

Made a few experiments with gofedlib; and along the way created issues / commits to that project

  • gofed/gofedlib#10
  • gofed/gofedlib#14
  • gofed/gofedlib#16

I think we should be fine to run gofed with Python 2 as we did previously. No need to port it to Python3 for now.

tuxdna commented 7 years ago

Add a couple more issues to gofedlib and a PR.

fridex commented 7 years ago

Add a couple more issues to gofedlib and a PR.

Let's incorporate golang support to our core workers using Python2 (for gofed) for now (gofed is just one piece in metadata task). Any code related fixes in gofed are not part of this card and could be reported&fixed later.

tuxdna commented 7 years ago

@fridex Porting not porting gofedlib to Python3 is out of scope of the discussion for this issue.

Can you please make a release? - https://github.com/gofed/gofedlib/issues/20

fridex commented 7 years ago

@fridex Porting not porting gofedlib to Python3 is out of scope of the discussion for this issue.

I think it worth to discuss as it will require significant engineering time to accomplish and we are not dependent on this work. It's a third party library we are using and we can still invoke it using python 2 interpreter (we did the same for example for brewutils) and we already did it in the previous go support implementation. There is no need for us to invest engineering time to port this library - especially not for initial golang support we need.

Can you please make a release? - gofed/gofedlib#20

Will do, you can use master if that is blocking you:

pip install git+https://github.com/gofed/gofedlib.git
tuxdna commented 7 years ago

gofedlib works with Python2 at the moment ( after a few fixes that I made ), which is sufficient to extract dependencies.

gofedlib still may fail due to some issues, for example here is one: https://github.com/gofed/gofedlib/issues/17

tuxdna commented 7 years ago

Issue encountered with server API https://github.com/fabric8-analytics/fabric8-analytics-server/issues/154

fridex commented 7 years ago

We can in parallel work on handler for gathering packages available in ecosystem. Relavant: https://github.com/openshiftio/openshift.io/issues/952

tuxdna commented 7 years ago

If we just follow a package name to be github.com/golang/protobuf/proto and version to be 1.0, we can query the analyses API like below:

In this scenario, the analyses API fails to accept this package analyses:

{
error: "Cannot match given query to any API v1 endpoint"
}

The reason of failure being that, package name shouldn't contain slashes.

msrb commented 7 years ago

If porting of gofedlib to Python 3 is not trivial, then we can run it with Python 2 for now and just open an issue so we don't forget. It would be nice to have it running on Python 3, but it's not critical (Red Hat will maintain Python 2 in RHEL for another 7+ years, so we are good on this front).

msrb commented 7 years ago

@tuxdna

The reason of failure being that, package name shouldn't contain slashes.

I think it should be possible to use URL-encoded slashes in package names.

msrb commented 7 years ago

Note I wouldn't worry about various package managers in Go as it seems that people use them to vendor dependencies in their Go applications. Initial Go ingestion is about Go libraries.

fridex commented 7 years ago

I think it should be possible to use URL-encoded slashes in package names.

I think it is not possible right now on server API (as package names are not decoded), but on jobs it naturally should work as it is passed in JSON.

tuxdna commented 7 years ago

Encountered issue while installing gofedlib in container image - https://paste.fedoraproject.org/paste/95qLrmrl1MHxv4~1PkAufw

Looking for a workaround.

tuxdna commented 7 years ago

Adding gofedlib-cli to the worker image proceeds further with following error:

worker-ingestion_1      | ame": "metadata", "event": "TASK_START", "queue": "saleem_ingestion_MercatorTask_v0", "node_args": {"name": "github.com/golang/dep", "force_graph_sync": true, "_audit": {"ended_at": "2017-10-03T12:50:09.487150", "version": "v1", "started_at": "2017-10-03T12:49:45.134295"}, "document_id": 1, "version": "v0.1.0", "force": false, "ecosystem": "go", "_release": "go:github.com/golang/dep:v0.1.0"}, "task_id": "9179f21b-10c1-42dd-bdda-47ad6478a9b5", "flow_name": "bayesianAnalysisFlow", "dispatcher_id": "92d501ff-512d-48cb-92da-5d6b98f8411a", "parent": {}}
worker-ingestion_1      | 03 12:51:28,455 [DEBUG] f8a_worker.object_cache: Retrieving object 'go/github.com/golang/dep/v0.1.0/v0.1.0.tar.gz' from bucket 'saleem-bayesian-core-temp-artifacts' to '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/v0.1.0.tar.gz'
worker-ingestion_1      | 03 12:51:28,472 [DEBUG] f8a_worker.utils: running command ['tar', 'xf', '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/v0.1.0.tar.gz', '-C', '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/extracted_package']
worker-ingestion_1      | 03 12:51:28,517 [DEBUG] f8a_worker.utils: running command '['gofedlib-cli', '--dependencies-main', '--dependencies-packages', '/var/lib/f8a_worker/worker_data/go/github.com/golang/dep/v0.1.0/extracted_package']'; timeout '300'
worker-ingestion_1      | 03 12:51:29,335 [ERROR] MercatorTask: Traceback (most recent call last):
worker-ingestion_1      |   File "/usr/bin/gofedlib-cli", line 109, in <module>
worker-ingestion_1      |     result.update(get_dependencies(project_packages(args.path), selection))
worker-ingestion_1      |   File "/usr/lib/python2.7/site-packages/gofedlib/go/functions.py", line 8, in project_packages
worker-ingestion_1      |     return GoSymbolsExtractor(source_code_directory).extract().packages()
worker-ingestion_1      |   File "/usr/lib/python2.7/site-packages/gofedlib/go/symbolsextractor/extractor.py", line 321, in extract
worker-ingestion_1      |     raise ExtractionError(err_msg)
worker-ingestion_1      | lib.types.ExtractionError: directory /internal/gps/_testdata/src/twopkgs contains definition of more packages, i.e. m1p
worker-ingestion_1      | 
worker-ingestion_1      | 03 12:51:29,336 [DEBUG] f8a_worker.object_cache: Removing cached files for go/github.com/golang/dep/v0.1.0

A fix is required in gofedlib.

tuxdna commented 7 years ago

One more issue (among other the issues listed above) to resolve for full analysis flow of a golang package:

Hit the analysis using: http://localhost:32000/api/v1/component-analyses/go/github.com%2Fstretchr%2Ftestify/master

Here is the error in Graph import: https://paste.fedoraproject.org/paste/0VO2lQgIO3lhhLSb8T-MdQ

Looking into it now.

tuxdna commented 7 years ago

On resolving the above error, two additional errors are are happening:

  1. selinon.errors.FlowError: {"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}
  2. ERROR:data_importer:import_epv() failed with error: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist

Both of these error appear to be related

  1. Medata data worker is failing
worker-ingestion_1      | : "saleem_ingestion_bayesianAnalysisFlow_v0", "will_retry": false, "selective": false, "flow_name": "bayesianAnalysisFlow", "event": "FLOW_FAILURE", "node_args": {"ecosystem": "go", "force": false, "force_graph_sync": true, "_audit": {"started_at": "2017-10-04T14:37:10.447877", "ended_at": "2017-10-04T14:37:17.373681", "version": "v1"}, "version": "master", "name": "github.com/stretchr/testify", "document_id": 1, "_release": "go:github.com/stretchr/testify:master"}, "retry": 15, "parent": null, "state": {"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}, "dispatcher_id": "aa06cb37-1c94-4c3a-a38f-62834fa4753b"}0-04 14:38:31,378 [ERROR] celery.app.trace: Task selinon.Dispatcher[aa06cb37-1c94-4c3a-a38f-62834fa4753b] raised unexpected: FlowError('{"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}',)
worker-ingestion_1      | Traceback (most recent call last):
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/celery/app/trace.py", line 367, in trace_task
worker-ingestion_1      |     R = retval = fun(*args, **kwargs)
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/celery/app/trace.py", line 622, in __protected_call__
worker-ingestion_1      |     return self.run(*args, **kwargs)
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/selinon/dispatcher.py", line 103, in run
worker-ingestion_1      |     raise self.retry(max_retries=0, exc=flow_error)
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/celery/app/task.py", line 668, in retry
worker-ingestion_1      |     raise_with_context(exc)
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/selinon/dispatcher.py", line 83, in run
worker-ingestion_1      |     retry = system_state.update()
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/selinon/systemState.py", line 760, in update
worker-ingestion_1      |     started, reused, fallback_started = self._continue_and_update_retry([])
worker-ingestion_1      |   File "/usr/lib/python3.4/site-packages/selinon/systemState.py", line 745, in _continue_and_update_retry
worker-ingestion_1      |     raise FlowError(json.dumps(state_info))
worker-ingestion_1      | selinon.errors.FlowError: {"finished_nodes": {"security_issues": ["a1f165e1-5ac8-484f-bc04-7e4c26d9b578"], "digests": ["cffbc003-c822-4d80-bad8-18199ca15c05"], "source_licenses": ["a524d388-5410-4ecd-a2bf-43a8b07e92e5"]}, "failed_nodes": {"metadata": ["93b51224-3640-43e7-8a2f-57fc65fa72d3"]}}
  1. Subsequently, Data Model Importer cannot find package bucket, hence fails:
data-model-importer_1   | --------------------------------------------------------------------------------
data-model-importer_1   | INFO in rest_api [/src/rest_api.py:48]:
data-model-importer_1   | Ingesting the given list of EPVs - [{"ecosystem": "go", "version": "master", "name": "github.com/stretchr/testify"}]
data-model-importer_1   | --------------------------------------------------------------------------------
data-model-importer_1   | ERROR:data_importer:import_epv() failed with error: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
worker-ingestion_1      | : "TASK_FAILURE", "parent": {"ResultCollector": "b2768add-e02a-4ec8-8f61-7325e875801b"}, "task_name": "GraphImporterTask", "queue": "saleem_ingestion_GraphImporterTask_v0", "node_args": {"version": "master", "document_id": 1, "force_graph_sync": true, "name": "github.com/stretchr/testify", "_audit": {"version": "v1", "started_at": "2017-10-04T15:02:10.869118", "ended_at": "2017-10-04T15:02:18.546199"}, "_release": "go:github.com/stretchr/testify:master", "force": false, "ecosystem": "go"}, "flow_name": "bayesianFlow", "retried_count": 0, "what": "Traceback (most recent call last):\n  File \"/usr/lib/python3.4/site-packages/selinon/selinonTaskEnvelope.py\", line 115, in run\n    result = task.run(node_args)\n  File \"/usr/lib/python3.4/site-packages/f8a_worker/base.py\", line 39, in run\n    result = self.execute(node_args)\n  File \"/usr/lib/python3.4/site-packages/f8a_worker/workers/graph_importer.py\", line 49, in execute\n    raise RuntimeError(\"Failed to invoke graph import at '%s' for %s\" % (endpoint, param))\nRuntimeError: Failed to invoke graph import at 'http://data-model-importer:9192/api/v1/ingest_to_graph' for [{'version': 'master', 'name': 'github.com/stretchr/testify', 'ecosystem': 'go'}]\n", "task_id": "1a12a52d-971f-4c04-abdd-dd8c675d7d9c", "dispatcher_id": "c5cce0af-f075-4dfa-91c0-91516e7bc4cc"}
data-model-importer_1   | ERROR:data_importer:Traceback for latest failure in import call: Traceback (most recent call last):
data-model-importer_1   |   File "/src/data_importer.py", line 149, in import_epv_http
data-model-importer_1   |     prefix=pkg_key_prefix))
data-model-importer_1   |   File "/src/data_source/s3_data_source.py", line 62, in list_files
data-model-importer_1   |     for obj in bucket.objects.filter(Prefix=prefix):
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/boto3/resources/collection.py", line 83, in __iter__
data-model-importer_1   |     for page in self.pages():
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/boto3/resources/collection.py", line 166, in pages
data-model-importer_1   |     for page in pages:
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 249, in __iter__
data-model-importer_1   |     response = self._make_request(current_kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/botocore/paginate.py", line 326, in _make_request
data-model-importer_1   |     return self._method(**current_kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/botocore/client.py", line 312, in _api_call
data-model-importer_1   |     return self._make_api_call(operation_name, kwargs)
data-model-importer_1   |   File "/usr/lib/python2.7/site-packages/botocore/client.py", line 601, in _make_api_call
data-model-importer_1   |     raise error_class(parsed_response, operation_name)
data-model-importer_1   | NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjects operation: The specified bucket does not exist
data-model-importer_1   | 
tuxdna commented 7 years ago

This will fix the second issue mentioned above:

msrb commented 7 years ago

PR (wip): https://github.com/fabric8-analytics/fabric8-analytics-worker/pull/376

fridex commented 7 years ago

BTW some import paths can lead to different git repos. This list could help to map import path to provider prefix when analysing projects.

msrb commented 7 years ago

Golang support is now available in production. Thanks everyone for help and suggestions!

joshuawilson commented 7 years ago

nice!