scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
26 stars 61 forks source link

Add workflow to check project metadata #340

Open andrewtavis opened 2 hours ago

andrewtavis commented 2 hours ago

Terms

Description

This issue would create a new workflow in .github/workflows called check_project_metadata.yaml that would call Python scripts that would check the project's metadata files language_metadata.json and data_type_metadata.json. We can put these scripts in a new .github/workflows directory called check. The scripts would be:

A code snippet for this comes from #330:

def get_available_languages() -> list[tuple[str, str]]:
    """
    Get available languages from the data extraction folder.

    Returns
    -------
        list[tuple[str, str]]: A list of tuples with the language name and its QID.
    """
    extraction_dir = LANGUAGE_DATA_EXTRACTION_DIR
    available_languages = []
    for lang_folder in extraction_dir.iterdir():
        if lang_folder.is_dir():  # Check if it's a directory
            lang_name = lang_folder.name

Contribution

Happy to support, answer questions and review as needed!

CC @DeleMike and @catreedle :)

KesharwaniArpita commented 2 hours ago

Hi, @andrewtavis , @DeleMike and @catreedle, Can I also contribute to this issue?

DeleMike commented 2 hours ago

Nice @KesharwaniArpita , I think you can.

Could you wait a bit for feedback from @catreedle , I wanna hear her thoughts :)

KesharwaniArpita commented 2 hours ago

Yeah Sure!!

DeleMike commented 2 hours ago

Thank you!