Add workflow to check project metadata

andrewtavis commented 2 hours ago

Terms

[X] I have searched open and closed feature requests
[X] I agree to follow Scribe-Data's Code of Conduct

Description

This issue would create a new workflow in .github/workflows called check_project_metadata.yaml that would call Python scripts that would check the project's metadata files language_metadata.json and data_type_metadata.json. We can put these scripts in a new .github/workflows directory called check. The scripts would be:

/src/scribe_data/check/check_language_metadata.py would check the language_data_extraction directory and make sure that all languages in this directory are included in the language_metadata.json file
- Note: We need #293 to be finished before this one is made

A code snippet for this comes from #330:

def get_available_languages() -> list[tuple[str, str]]:
    """
    Get available languages from the data extraction folder.

    Returns
    -------
        list[tuple[str, str]]: A list of tuples with the language name and its QID.
    """
    extraction_dir = LANGUAGE_DATA_EXTRACTION_DIR
    available_languages = []
    for lang_folder in extraction_dir.iterdir():
        if lang_folder.is_dir():  # Check if it's a directory
            lang_name = lang_folder.name

/src/scribe_data/check/check_data_type_metadata.py would check each sub directory of the language directories and assure that the included sub directories that are named for data types are also included in the data_type_metadata.json file
Both scripts need to account for the fact that we have some languages that are sub-languages of a meta one like Norwrgian or Hindustani
We also need both of these scripts to confirm that there are no entries in the JSONs that are not in the directories, so say that we move a language to be a sub-language, then this needs to be reflected

Contribution

Happy to support, answer questions and review as needed!

CC @DeleMike and @catreedle :)

KesharwaniArpita commented 2 hours ago

Hi, @andrewtavis , @DeleMike and @catreedle, Can I also contribute to this issue?

DeleMike commented 2 hours ago

Nice @KesharwaniArpita , I think you can.

Could you wait a bit for feedback from @catreedle , I wanna hear her thoughts :)

KesharwaniArpita commented 2 hours ago

Yeah Sure!!

DeleMike commented 2 hours ago

Thank you!

scribe-org / Scribe-Data