jataware / domain-model-examiner

The goal of this process is to perform machine reading over the model codebase in order to automatically extract key metadata.
MIT License
1 stars 0 forks source link

Remote data provenance #10

Open brandomr opened 3 years ago

brandomr commented 3 years ago

For data that comes from the web, try to identify provenance information. We could potentially leverage the Wikipedia API to gather a high level description of the source. For example, the FAO Wikipedia page has rich information about that organization.

Additionally, we should consider grabbing column headers from all data files and inputs when possible. This could be organized as below (for example):

    usda.gov:
        - provenance: "DESCRIPTION OF USDA (from wiki?) HERE" 
        - files:
            - psd_grains_pulses_csv:
                url: https://apps.fas.usda.gov/psdonline/downloads/psd_grains_pulses_csv.zip
                columns: ["price", "yield"]