uberspot / OpenTriviaQA

A creative commons dataset of trivia questions and answers
Creative Commons Attribution Share Alike 4.0 International
201 stars 70 forks source link

Issue converting to json with converter.rb #21

Open NMedvesky opened 7 months ago

NMedvesky commented 7 months ago

Ruby script gives this error when trying to convert the data. I'm using ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

converter.rb:24:in `strip!': invalid byte sequence in UTF-8 (Encoding::CompatibilityError)
        from converter.rb:24:in `stripAndEncode'
        from converter.rb:40:in `block (2 levels) in <main>'
        from converter.rb:38:in `open'
        from converter.rb:38:in `block in <main>'
        from converter.rb:27:in `each'
        from converter.rb:27:in `<main>'

So rather than try to get the ruby script to work I made a Python script which does effectively the same thing, so I thought I'd share it for anyone coming across the same issue.

import sys
import string
import json

ALPHABET = list(string.ascii_uppercase)

for file in sys.argv[1:]:
    print(file)
    with open(file, "r", encoding='ISO-8859-1') as f:
        lines = f.readlines()

    questions = []
    question = {}
    for line in lines:
        if line.startswith("#Q "):
            question["question"] = line[3:-1]
            question["category"] = os.path.basename(file)

        elif line.startswith("^ "):
            question["answer"] = line[2:-1]

        elif line[0] in ALPHABET:
            if not question.get("choices"):
                question["choices"] = []

            question["choices"].append(line[2:-1])

        elif line.strip() == "" and question != {}:
            questions.append(question)
            question = {}

    with open(f"{file}.json", "w", encoding="utf-8") as f:
        json.dump(questions, f, indent=4)
Clayton-Brandt commented 1 month ago

Hey man, thanks a ton! I'm doing a fun personal project with the trivia questions in C#, so glad this code worked! The only thing that I had to add was import os, that's when the code ran. Thanks again.