shrivastava95 / odia-dictionary

A repository for organizing contributions to the creation of an Odia Dictionary dataset for the Dictionary Augmented Translations project in C4GT'23.
4 stars 1 forks source link

Dictionary Storage #3

Open ChakshuGautam opened 1 year ago

ChakshuGautam commented 1 year ago

Suggestive SQL Schema for Storing the Dictionary

CREATE TABLE Word (
    WordID INT PRIMARY KEY,
    Word VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE PartOfSpeech (
    PartOfSpeechID INT PRIMARY KEY,
    PartOfSpeech VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE Definition (
    DefinitionID INT PRIMARY KEY,
    WordID INT,
    PartOfSpeechID INT,
    Pronunciation VARCHAR(255),
    Definition TEXT,
    FOREIGN KEY (WordID) REFERENCES Word(WordID),
    FOREIGN KEY (PartOfSpeechID) REFERENCES PartOfSpeech(PartOfSpeechID)
);

CREATE TABLE Example (
    ExampleID INT PRIMARY KEY,
    DefinitionID INT,
    Example TEXT,
    FOREIGN KEY (DefinitionID) REFERENCES Definition(DefinitionID)
);

CREATE TABLE Etymology (
    EtymologyID INT PRIMARY KEY,
    WordID INT,
    Etymology TEXT,
    FOREIGN KEY (WordID) REFERENCES Word(WordID)
);

INSERT INTO PartOfSpeech (PartOfSpeechID, PartOfSpeech)
VALUES 
(1, 'Noun'),
(2, 'Pronoun'),
(3, 'Verb'),
(4, 'Adjective'),
(5, 'Adverb'),
(6, 'Preposition'),
(7, 'Conjunction'),
(8, 'Interjection');

Suggestive JSON/MongoDB Structure

{
    "id": 1,
    "word": "run",
    "pronunciations": [
        {
            "partOfSpeech": "verb",
            "pronunciation": "rən",
            "definitions": [
                {
                    "definition": "move at a speed faster than a walk, never having both or all the feet on the ground at the same time",
                    "examples": ["He ran to catch the bus."]
                },
                {
                    "definition": "execute or perform a series of operations",
                    "examples": ["The software runs a series of checks."]
                }
            ]
        },
        {
            "partOfSpeech": "noun",
            "pronunciation": "rən",
            "definitions": [
                {
                    "definition": "an act or spell of running",
                    "examples": ["I went for a run in the morning."]
                }
            ]
        }
    ],
    "etymology": "Old English rinnan, of Germanic origin; related to Dutch runnen and German rennen"
}

Recommendation

I would recommend us going with JSON representation of all words and saving them in split JSON files. Can be split with the number of words - 1000/file.

shrivastava95 commented 1 year ago

Will look into this after some time.