odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
805 stars 260 forks source link

CSV asset reader sample fails #3259

Closed planetf1 closed 3 years ago

planetf1 commented 4 years ago

The csv asset file reader fails

In this case the asset management notebooks have also been run, however the sample fails with:

jonesn:egeria/ (issue3254b) $ java -jar ./open-metadata-resources/open-metadata-samples//access-services-samples/asset-management-samples/asset-reader-csv-sample/target/asset-reader-csv-sample-2.0-jar-with-dependencies.jar
===============================
CSV File Reader Sample
===============================
Running against server: cocoMDS4 at https://localhost:9444
Using userId: erinoverview
Reading file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv

The open metadata repositories have returned 1 asset definitions for the requested file name open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
===============================
Accessing file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
Exception BASIC-FILE-CONNECTOR-404-001 The file named file://secured/research/clinical-trials/drop-foot/Patients.csv in the Connection object file://secured/research/clinical-trials/drop-foot/Patients.csv CSV File Store Connection does not exist

Only one asset is found, which is correct (ContactList.csv). It is unclear why the exception is reporting Patients.csv

It's also worth noting that although the ContactList.csv file exists, it probably should be moved into this sample. and also may be dependent on the current working directory when run

planetf1 commented 4 years ago

After cleaning the repos and only configuring, starting, and then running the create CSV sample, we still get:

jonesn:egeria/ (issue3254b) $ java -jar ./open-metadata-resources/open-metadata-samples//access-services-samples/asset-management-samples/asset-reader-csv-sample/target/asset-reader-csv-sample-2.0-jar-with-dependencies.jar
===============================
CSV File Reader Sample
===============================
Running against server: cocoMDS4 at https://localhost:9444
Using userId: erinoverview
Reading file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv

The open metadata repositories do not have an asset definition for the requested file name open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
===============================
Accessing file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
Number of records: 26
First 10 records ...
------------------------------------------------------------------------
[RecId, ContactType, FirstName, LastName, Company, JobTitle, WorkLocation]
------------------------------------------------------------------------
[1, E, Zach, Now, Coco Pharmaceuticals, Founder, 3]
[2, E, Steve, Starter, Coco Pharmaceuticals, Founder, 1]
[3, E, Terri, Daring, Coco Pharmaceuticals, Founder, 2]
[4, E, Tanya, Tidy, Coco Pharmaceuticals, "Data Steward, Clinical Trials, 3]
[5, E, Polly, Tasker, Coco Pharmaceuticals, IT Project Leader, 1]
[6, E, Tessa, Tube, Coco Pharmaceuticals, "Lead Researcher, Clinical Trials, 3]
[7, E, Callie, Quartile, Coco Pharmaceuticals, Data Scientist, 3]
[8, E, Ivor, Padlock, Coco Pharmaceuticals, Chief Security Officer, 1]
[9, E, Bob, Nitter, Coco Pharmaceuticals, Application Developer, 1]
[10, E, Faith, Broker, Coco Pharmaceuticals, Human Resources and Compliance Director, 1]
------------------------------------------------------------------------
No asset properties  ...

A search with TEX shows that the asset exists:

Entity: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv [added in gen 1]
GUID : f9f7ccea-bee9-4431-ad1f-567857191540

Type: CSVFile

Version : 3

Status : ACTIVE

Properties:

delimiterCharacter : ,
description : This is a new file asset created by the CreateCSVFileAssetSample.
fileType : csv
latestChange : Asset created
name : open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
qualifiedName : open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
quoteCharacter : "
Classifications:

AssetOwnership
AssetZoneMembership
Home Repository:

metadataCollectionName : cocoMDS1
metadataCollectionId : 
fa91620f-7f34-4455-8596-45657eb22348
OMRS Control Properties:

createdBy : peterprofile
createTime : 2020-06-25T15:36:49.122+00:00
updatedBy : peterprofile
updateTime : 2020-06-25T15:36:49.153+00:00
maintainedBy :
list is empty
instanceURL : undefined
instanceLicense : undefined
instanceProvenanceType : LOCAL_COHORT
replicatedBy : undefined

Note the qualified name looks correct -- though the query is being done against cocoMDS4 for TEX, using the enterprise connector

Switching the reader sample back to cocoMDS1 does then retrieve the asset metadata -- though not the file.

I've not been able to reproduce the original issue where we picked up a wrong csv file. The scenario had been a clean environment, launched the configure, start notebooks .. THEN ran through the other notebooks except cts. One of these may have created rogue metadata, though I can not reproduce.

I don't see this as an inhibitor for the 2.0 release, but we do need to ensure the right servers are being used, that we add more test to the readme for the sample, and correct the use of enterprise connector if needed, plus ensure the sample file exists/in right place -- as this sample is a useful stepping stone for understanding asset management in egeria.

planetf1 commented 4 years ago

We get a similar issue with the avro file reader sample.

In this case I took the sample 'weather.avro' from the avro project, and was able to add this to egeria's metadata repo.

However the reader fails against cocoMDS4 (as currently coded), whilst it works against cocoMDS1

mandy-chessell commented 4 years ago

This is probably because the supported zones are different on each repository so it is expected that different assets are visible through different repositories. However I will check that this is the case.

mandy-chessell commented 4 years ago

The file itself does need to be in the correct location to allow the connector to read it. Without knowing details of where it was run from I can not comment on whether it is a problem or incorrect environment. Basically the filename (including path) has to be correct in the call to create the asset.

I am assuming TEX should be REX. Either one accesses the metadata repository using the repository services. These services do not support the filtering of assets using the governance zones. Governance zones are supported by the OMAS.

The samples do not set up zones on the assets. This means the servers use the value from "defaultZones" to set the zone value when the asset is created. When the file is read, the value from the 'supportedZones' is used. These values are configured on each access service within the server. Each of the metadata servers in the coco hands on labs is set up with different defaultZones and supportedZones.

For example:

asset-zones-for-building-catalog

cocoMDS4 does not support any access services permitting assets to be created. However it does have asset-consumer which is set up with SupportedZones="data-lake".

When assets are created in cocoMDS1, the zone is set to "quarantine-zone". This asset is invisible to users of cocoMDS4. It should be possible to see them from cocoMDS2 because it has supportedZones="" which means all.

I think it is working as is should

planetf1 commented 4 years ago

Thanks. I'll assign it to myself to retest/check docs next time I run

planetf1 commented 4 years ago

I went to run this directly from the 'distribution' and whilst the create appears to work (no error is reported) it's not clear it's finding the file.

When run from the top level of egeria not the distribution, but calling the uber jar in the distribution it also appears to work, and files look ok, but file is not found. This may indeed be due to reading from cocoMDS2

I think for the sample we need

So that anyone can come along and understand the sample and try it out - then incrementally explore more

(this probably applies to other samples)

Rather than a bug this is more a reflection of some work to improve usability of the samples

planetf1 commented 4 years ago

My run (for example)

jonesn:egeria/ (issue3441) $ java -jar ./open-metadata-distribution/open-metadata-assemblies/target/egeria-2.2-SNAPSHOT-distribution/egeria-omag-2.2-SNAPSHOT/samples/asset-create-csv-sample-2.2-SNAPSHOT-jar-with-dependencies.jar
===============================
CSV File Asset Creation
===============================
Running against server: cocoMDS1 at https://localhost:9444
Using userId: peterprofile
Creating file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv

jonesn:egeria/ (issue3441) $ java -jar ./open-metadata-distribution/open-metadata-assemblies/target/egeria-2.2-SNAPSHOT-distribution/egeria-omag-2.2-SNAPSHOT/samples/asset-reader-csv-sample-2.2-SNAPSHOT-jar-with-dependencies.jar
===============================
CSV File Reader Sample
===============================
Running against server: cocoMDS4 at https://localhost:9444
Using userId: erinoverview
Reading file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv

The open metadata repositories do not have an asset definition for the requested file name open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
===============================
Accessing file: open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
Number of records: 26
First 10 records ...
------------------------------------------------------------------------
[RecId, ContactType, FirstName, LastName, Company, JobTitle, WorkLocation]
------------------------------------------------------------------------
[1, E, Zach, Now, Coco Pharmaceuticals, Founder, 3]
[2, E, Steve, Starter, Coco Pharmaceuticals, Founder, 1]
[3, E, Terri, Daring, Coco Pharmaceuticals, Founder, 2]
[4, E, Tanya, Tidy, Coco Pharmaceuticals, "Data Steward, Clinical Trials, 3]
[5, E, Polly, Tasker, Coco Pharmaceuticals, IT Project Leader, 1]
[6, E, Tessa, Tube, Coco Pharmaceuticals, "Lead Researcher, Clinical Trials, 3]
[7, E, Callie, Quartile, Coco Pharmaceuticals, Data Scientist, 3]
[8, E, Ivor, Padlock, Coco Pharmaceuticals, Chief Security Officer, 1]
[9, E, Bob, Nitter, Coco Pharmaceuticals, Application Developer, 1]
[10, E, Faith, Broker, Coco Pharmaceuticals, Human Resources and Compliance Director, 1]
------------------------------------------------------------------------
No asset properties  ...
jonesn:egeria/ (issue3441) $ ls open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv                     [12:50:54]
open-metadata-resources/open-metadata-samples/access-services-samples/asset-management-samples/ContactList.csv
jonesn:egeria/ (issue3441) $
github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

mandy-chessell commented 3 years ago

This is working as designed - not sure there is anything future to do