Open krishnapaparaju opened 7 years ago
Current status:
Note I suspect that testing whether data landed in graph might be tricky as the tests run staging environment where graph is already populated. cc @tisnik
Note I suspect that testing whether data landed in graph might be tricky
We can use last_updated
key for Package and Version. It is time elapsed in seconds since epoch. We can check if the last_updated
> our expected time (submission time)
EDITED - last_updated
is maintained at both Package and Version node
Note I suspect that testing whether data landed in graph might be tricky
that brings another question - how well is this part checked by unit tests? Could we rely on existing internal API?
@miteshvp in order to finish this task I'd need to know the structure of data stored in our graph DB. Could you please point me to (any) documentation about this topic? Thank you in advance!
@tisnik - we do not have any documentation around the structure. I am pasting a response for your use. Let me know if you need more information.
Package - io.vertx:vertx-core
{
"requestId": "ea0d940c-bb7a-45b1-8cd7-aad88b0976e0",
"status": {
"message": "",
"code": 200,
"attributes": {}
},
"result": {
"data": [{
"gh_issues_last_month_opened": [-1],
"gh_prs_last_year_closed": [-1],
"libio_usedby": ["TechEmpower/FrameworkBenchmarks:2891", "apiman/apiman:345", "boonproject/boon:473", "hawkular/hawkular-apm:132", "isaiah/jubilee:342", "jbosstm/narayana:76", "jhalterman/failsafe:1795", "vert-x3/vertx-stack:78", "wildfly-swarm/wildfly-swarm:190", "wisdom-framework/wisdom:72"],
"ecosystem": ["maven"],
"gh_subscribers_count": [570],
"gh_contributors_count": [30],
"vertex_label": ["Package"],
"libio_dependents_repos": ["4.75K"],
"last_updated_sentiment_score": ["2017-10-09"],
"sentiment_magnitude": ["0"],
"gh_issues_last_year_opened": [-1],
"gh_issues_last_month_closed": [-1],
"gh_open_issues_count": [184],
"libio_dependents_projects": ["128"],
"latest_version": ["3.4.1"],
"tokens": ["core", "io", "vertx"],
"package_relative_used": ["not used"],
"gh_stargazers": [6946],
"gh_forks": [1274],
"package_dependents_count": [-1],
"gh_prs_last_month_opened": [-1],
"gh_issues_last_year_closed": [-1],
"sentiment_score": ["0"],
"last_updated": [1.51178887579E9],
"gh_prs_last_month_closed": [-1],
"libio_total_releases": ["48"],
"gh_prs_last_year_opened": [-1],
"name": ["io.vertx:vertx-core"],
"libio_latest_version": ["3.5.0.Beta1"],
"libio_latest_release": [1.5020442E9]
}],
"meta": {}
}
}
Version 3.4.1
{
"requestId": "50aa304a-d7ae-4aed-a847-65600cc6e3f3",
"status": {
"message": "",
"code": 200,
"attributes": {}
},
"result": {
"data": [{
"last_updated": [1.50823601887E9],
"shipped_as_downstream": [false],
"pname": ["io.vertx:vertx-core"],
"vertex_label": ["Version"],
"description": ["Sonatype helps open source projects to set up Maven repositories on https://oss.sonatype.org/"],
"version": ["3.4.2"],
"dependents_count": [11],
"licenses": ["Apache 2.0", "EPL 1.0", "MIT License"]
"declared_licenses": ["Eclipse Public License - v 1.0", "The Apache Software License, Version 2.0"],
"pecosystem": ["maven"],
"osio_usage_count": [6]
}],
"meta": {}
}
}
Thanks @miteshvp. If I understand correctly, these are results generated by following queries:
g.V().has("name", "io.vertx:vertx-core").has("ecosystem", "maven")
and
g.V().has("pname": "io.vertx:vertx-core").has("version", "3.4.2").has("pecosystem", "maven")
Btw is last_updated
really supposed to be float value? I'm pretty sure it must be int64 or uint64.
@tisnik - Your queries are right. That's what I used to generate the response.
last_updated
is a double value. If you see it closely it has E9
in the last
thanks a lot @miteshvp for clarification.
re double value: yeah I know it's double, but I was interested why it's serialized this way, with lost of precision. Because for storing last_updated
attribute, the str(time.time()) is used and this call returns proper Unix time, with or without decimal numbers (it's system dependent):
str(time.time())
1511895953.1909728
It would be interesting to know where the precision (at least four decimal digits) is lost - during the store operation, in the JSON serialization or somewhere in the middle?
(FYI: I'd need to look closely at the schema, as some attributes have strange types :)
@tisnik - is this card still blocked? Please let me know if you have more questions. Else suggest to remove the label accordingly. Thanks.
@miteshvp no, it is no longer blocked. TY
To be able to successfully develop, debug, and run these tests, the following issue need to be resolved: [f8a] data-importer: The import failed: 'status' #1526
@tisnik - are you still blocked?
@miteshvp your changes has been deployed to stage today and everything works. TY, I'm cleaning the status now: )
Marking as blocked as we are waiting for AWS creds for CI.
Still waiting for credentials to land in CI.
who is this blocked on ?
@kbsingh we need to have S3 credentials to be used on CI (ie. we need to know hashes of the 'real' credentials)
@tisnik do we still wait for the credentials to be available in CI?
@tisnik - are we still blocked ? Can you bring it up in today's standup?
@msrb - please let me know if you need help to unblock @tisnik
@tisnik are we still blocked here?
I am going to move this issue to the backlog. Will talk to @tisnik and we will come up with a new way of running these kind of tests.
User story
As a fabric8-analytics developer, I want to run all available integration/E2E tests for data ingestion part of the pipeline on every merge to master branch. This will help me to catch bugs early so I can fix them before they get promoted to production.
Description
Currently integration tests are either not complete, not enabled for components involved in data gathering ingestion parts of OSIO analytics architecture. Starting point of these integration tests would be ingestion of data from public sources and end point would be landing the processed data either at S3 / Graph (or both of these destinations).
Note the tests for data ingestion part of the pipeline already exist, but they are not enabled in CI as it is missing credentials that are needed for running the tests.
Acceptance criteria
Tasks
Work in progress
x
Done