miyax0227 / quizScatterer

word2vecとクラスタリングでクイズの出題順を最適化したい
1 stars 0 forks source link

[REFACTOR] データ構造の定義 #3

Open m-uesaka opened 3 months ago

m-uesaka commented 3 months ago

dictで管理されているデータをdataclassを使ってわかりやすくする.

Target

https://github.com/miyax0227/quizScatterer/blob/d3c3258ca68842e8708aefe33f4774c2ed733a7e/quizscatterer/qs.py#L115-L133

@dataclass(frozen=True)
class NounData:
    surface: str
    type: str # 必要?
    vector: np.ndarray

@dataclass(frozen=True)
class SentenceNounData: # question_vector in current code
    # Since NounData is not hashable, we use two lists.
    nouns: list[NounData]
    counts: list[int] # number of nouns

    def __post_init__(self) -> None:
        # validation
        # 1. Check whether the lengths of nouns and counts are same.
        # 2. Check whether counts are non-negative integer list.

    @staticmethod
    def from_noun_data_list(noun_data_list: list[NounData]) -> "SentenceNounData":
          # Count the appearance of nouns and save them.

TODO