transparencia-mg / dpckan

MIT License
5 stars 4 forks source link

Improve get dataset function #163

Closed gabrielbdornas closed 2 years ago

gabrielbdornas commented 2 years ago

Fix #144

@fjuniorr vou fechar o #145 pois as melhorias foram todas implementadas em uma nova branch.

Incluo aqui comentários importantes realizados por você no PR citado acima.

Basicamente as mudanças implementadas foram:

fjuniorr commented 2 years ago

Segue comentários de revisão.

Respeitar slug da URL do dataset (ie. não converter - para _) :heavy_check_mark:

Adicionar pacote validators ao setup.py :heavy_check_mark:

Alterar --link-id para argumento com nome de URL :heavy_check_mark:

Documentar que no argumento URL podem ser passados outros valores (eg. id) na docstring.

Função Package aceita URL com data package descriptor :heavy_check_mark:

Substituir Package(json.loads(urlopen(ckan_datapackage_resource['url']).read())) por dp = Package(ckan_datapackage_resource['url']))

def get_remote_dataset_metadata(ckan_instance, dataset_name):
  ckan_datapackage_resource_id = get_ckan_datapackage_resource_id(ckan_instance, dataset_name)
  ckan_datapackage_resource = ckan_instance.action.resource_show(id=ckan_datapackage_resource_id)
  remote_dataset_metadata = Package(json.loads(urlopen(ckan_datapackage_resource['url']).read()))
  return remote_dataset_metadata

The data package has an error: resource "datapackage" does not exist :heavy_check_mark:

Data package com recurso datapackage.json com propriedade name datapackage

$ dpckan -H foo -k bar dataset get -li https://dados.mg.gov.br/dataset/compras-emergenciais-covid-19
Creating ./compras_emergenciais_covid_19 folder.
Traceback (most recent call last):
  File "/Users/fjunior/Projects/cge_dpckan/venv/bin/dpckan", line 33, in <module>
    sys.exit(load_entry_point('dpckan', 'console_scripts', 'dpckan')())
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/get_dataset.py", line 19, in get_dataset_cli
    get_dataset(ctx.obj['CKAN_HOST'], link_id, path)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/functions.py", line 278, in get_dataset
    download_dataset_resources(ckan_host, dataset_id, path)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/functions.py", line 310, in download_dataset_resources
    file_path = remote_dataset_metadata.get_resource(file_name).path
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/frictionless/package.py", line 416, in get_resource
    raise FrictionlessException(error)
frictionless.exception.FrictionlessException: [package-error] The data package has an error: resource "datapackage" does not exist

Tratar erro de conjunto de dados sem recurso datapackage.json :heavy_check_mark:

dpckan -H foo -k bar dataset get -li https://homologa.cge.mg.gov.br/dataset/datapackage-reprex
Traceback (most recent call last):
  File "/Users/fjunior/Projects/cge_dpckan/venv/bin/dpckan", line 33, in <module>
    sys.exit(load_entry_point('dpckan', 'console_scripts', 'dpckan')())
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/fjunior/Projects/cge_dpckan/venv/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/get_dataset.py", line 19, in get_dataset_cli
    get_dataset(ctx.obj['CKAN_HOST'], link_id, path)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/functions.py", line 278, in get_dataset
    download_dataset_resources(ckan_host, dataset_id, path)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/functions.py", line 292, in download_dataset_resources
    remote_dataset_metadata = get_remote_dataset_metadata(ckan_instance, dataset_name)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/functions.py", line 201, in get_remote_dataset_metadata
    ckan_datapackage_resource_id = get_ckan_datapackage_resource_id(ckan_instance, dataset_name)
  File "/Users/fjunior/Projects/cge_dpckan/dpckan/functions.py", line 271, in get_ckan_datapackage_resource_id
    ckan_datapackage_resource_id = [i["id"] for i in ckan_datapackage_resources if i["url"].split('/')[-1] == "datapackage.json"][0]
IndexError: list index out of range

Não limpar sem confirmação do usuário :heavy_check_mark:

dpckan -H foo -k bar dataset get -li https://homologa.cge.mg.gov.br/dataset/frictionless-hangout-jan2022
Cleaning ./frictionless_hangout_jan2022's folder contents already existent.
Downloading estados resource to ./frictionless_hangout_jan2022/data.
Downloading pib-per-capita resource to ./frictionless_hangout_jan2022/data.
Downloading datapackage.json.