markhoerth / dremio_client

Apache License 2.0
31 stars 25 forks source link

Can't fetch physical data sources metadata #234

Open nikkatalnikov opened 3 years ago

nikkatalnikov commented 3 years ago

Description

I am trying to fetch physical datasets info via simple client API, but look like it only returns VIRTUAL_DATASETs.

    client = init(simple_client=True)
    catalog_raw_api_data = client.catalog()

    c_ids = map(lambda x: (x['id'], x['path']), catalog_raw_api_data['data'])
    for (c_id, c_path) in c_ids:
        catalog_item = client.catalog_item(c_id, c_path)
        entity_type = catalog_item.get('entityType')
        print(catalog_item)

Alternatively, via DremiClient I can't get anything:

    client = init()
    catalog = client.data
    pds = catalog.source.pds.get()

this renders an error:

Traceback (most recent call last):
  File "/Users/nikkatalnikov/opt/anaconda3/envs/flowtale/lib/python3.8/site-packages/dremio_client/model/data.py", line 484, in __getattr__
    value = dict.__getitem__(self, item)
KeyError: 'source'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/nikkatalnikov/Desktop/projetcs/flowtale/flowtale-acc/application/dremio-datalake/dremio-clonner/dremio-exporter.py", line 8, in <module>
    pds = catalog.source.pds.get()
  File "/Users/nikkatalnikov/opt/anaconda3/envs/flowtale/lib/python3.8/site-packages/dremio_client/model/data.py", line 492, in __getattr__
    return dict.__getitem__(self, item)
KeyError: 'source'

What am I doing wrong? Thank you!

rymurr commented 3 years ago

Hey Nik,

I'm out of the office for a few weeks but will check this asap when I'm back. Apologies for the delay!

On Thu, 6 May 2021, 00:17 Nik Katalnikov, @.***> wrote:

  • Dremio client version: 0.14.0
  • Dremio version: 14.0.0
  • Python version: 3.8
  • Operating System: Mac OS 10.15.7 Catalina (Dockerized)

Description

I am trying to fetch physical datasets info via simple client API, but look like it only returns VIRTUAL_DATASETs.

client = init(simple_client=True)
catalog_raw_api_data = client.catalog()

c_ids = map(lambda x: (x['id'], x['path']), catalog_raw_api_data['data'])
for (c_id, c_path) in c_ids:
    catalog_item = client.catalog_item(c_id, c_path)
    entity_type = catalog_item.get('entityType')
    print(catalog_item)

Alternatively, via DremiClient I can't get anything:

client = init()
catalog = client.data
pds = catalog.source.pds.get()

this renders an error:

Traceback (most recent call last): File "/Users/nikkatalnikov/opt/anaconda3/envs/flowtale/lib/python3.8/site-packages/dremio_client/model/data.py", line 484, in getattr value = dict.getitem(self, item) KeyError: 'source'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/nikkatalnikov/Desktop/projetcs/flowtale/flowtale-acc/application/dremio-datalake/dremio-clonner/dremio-exporter.py", line 8, in pds = catalog.source.pds.get() File "/Users/nikkatalnikov/opt/anaconda3/envs/flowtale/lib/python3.8/site-packages/dremio_client/model/data.py", line 492, in getattr return dict.getitem(self, item) KeyError: 'source'

What am I doing wrong? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rymurr/dremio_client/issues/234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPNXII6QCZX7Y3R433KITLTMG7XPANCNFSM44F3IZCQ .

nikkatalnikov commented 3 years ago

@rymurr do you have an update on the issue? :)

rymurr commented 3 years ago

Hey @nikkatalnikov sorry for the delay!

I ran the following against a clean Dremio 14.5.0

python -c "from dremio_client import init;c=init();[print(c.data[i]) for i in c.data]"|jq
{
  "entityType": "home",
  "id": "e37a0e32-919e-4edf-a54d-9e812a08bce6",
  "name": null,
  "tag": "qDM283kE6Og=",
  "path": [
    "@dremio"
  ],
  "accessControlList": null
}
{
  "entityType": "source",
  "id": "cf7dd756-b37d-46a4-9662-6805ded0f8ee",
  "name": null,
  "description": null,
  "tag": "Hqes8XM1Bnw=",
  "type": "CONTAINER",
  "config": null,
  "createdAt": "2021-06-01T08:27:55.338Z",
  "metadataPolicy": null,
  "state": null,
  "accelerationGracePeriodMs": null,
  "accelerationRefreshPeriodMs": null,
  "accelerationNeverExpire": null,
  "accelerationNeverRefresh": null,
  "path": [
    "Samples"
  ],
  "accessControlList": null
}

and when looking at the pds's in teh sample source:

python -c "from dremio_client import init;c=init();[print(c.data.Samples.samples_dremio_com[i].get()) for i in c.data.Samples.samples_dremio_com.get()]"|jq
{
  "entityType": "file",
  "id": "dremio:/Samples/samples.dremio.com/\"SF weather 2018-2019.csv\"",
  "path": [
    "Samples",
    "samples.dremio.com",
    "\"SF weather 2018-2019.csv\""
  ],
  "accessControlList": null
}
{
  "entityType": "dataset",
  "id": "978dd231-abb8-4ae1-8c6f-1073d9e2d211",
  "path": [
    "Samples",
    "samples.dremio.com",
    "SF_incidents2016.json"
  ],
  "tag": "0zNDJreBWoA=",
  "type": "PHYSICAL_DATASET",
  "fields": [
    {
      "name": "IncidntNum",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Category",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Descript",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "DayOfWeek",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Date",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Time",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "PdDistrict",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Resolution",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Address",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "X",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Y",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "Location",
      "type": {
        "name": "VARCHAR"
      }
    },
    {
      "name": "PdId",
      "type": {
        "name": "BIGINT"
      }
    }
  ],
  "createdAt": "2021-06-01T08:33:29.829Z",
  "accelerationRefreshPolicy": null,
  "sql": null,
  "sqlContext": null,
  "format": {
    "type": "JSON",
    "fullPath": [
      "Samples",
      "samples.dremio.com",
      "SF_incidents2016.json"
    ],
    "ctime": 0,
    "isFolder": false,
    "location": "/samples.dremio.com/SF_incidents2016.json"
  },
  "approximateStatisticsAllowed": null,
  "accessControlList": null
}
{
  "entityType": "file",
  "id": "dremio:/Samples/samples.dremio.com/\"zip_lookup.csv\"",
  "path": [
    "Samples",
    "samples.dremio.com",
    "\"zip_lookup.csv\""
  ],
  "accessControlList": null
}
{
  "entityType": "file",
  "id": "dremio:/Samples/samples.dremio.com/\"zips.json\"",
  "path": [
    "Samples",
    "samples.dremio.com",
    "\"zips.json\""
  ],
  "accessControlList": null
}
{
  "entityType": "folder",
  "id": "dremio:/Samples/samples.dremio.com/\"Dremio University\"",
  "path": [
    "Samples",
    "samples.dremio.com",
    "\"Dremio University\""
  ],
  "tag": null,
  "accessControlList": null
}
{
  "entityType": "folder",
  "id": "dremio:/Samples/samples.dremio.com/\"NYC-taxi-trips\"",
  "path": [
    "Samples",
    "samples.dremio.com",
    "\"NYC-taxi-trips\""
  ],
  "tag": null,
  "accessControlList": null
}