ricardonascimentosoares / cadastro-materiais-gov

Project to capture, filter and transform data from https://compras.dados.gov.br
1 stars 0 forks source link

Repository: cadastro-materiais-gov

Description

This GitHub repository hosts a PySpark application designed for capturing and processing data from Compras Governamentais and ComprasNet, with a specific focus on materials classified under group 65: "Equipamentos E Artigos Para Uso Médico, Dentário E Veterinário" . The application utilizes the medallion approach to curate the data, ensuring high-quality and reliable information.

Features

Installation

  1. Clone the repository:

    
    git clone https://github.com/ricardonascimentosoares/cadastro-materiais-gov.git
  2. Navigate to the project directory:

    
    cd cadastro-materiais-gov
  3. Place the gcp_key_compras_bucket.json file in the utils folder.

  4. Build and run the Docker container:

    
    docker build -t cadastro-materiais-gov .
    docker run -it cadastro-materiais-gov

Data Output

This project generates 3 files in the output folder, extracted from the Gold Layer. They are available to download:

  1. material_agg.xlsx: Quantities of materials items grouped by PDM and Classe.

  2. material_char_detail.xlsx: Data at grain of Characteristics from material items.

  3. material_list.xlsx: Analytical data showing the info of materials items