Open Chalermpun opened 2 months ago
Strategy Pattern
: functions can be encapsulated into separate strategy classesTemplate Method Pattern
: create_flan_dataset function contains a series of steps for loading and processing different datasets, extract steps into a template method in a base class and define abstract methods for dataset-specific operations.Factory Pattern
: Instead of directly creating dataset objects using load_dataset and load_from_disk, you can introduce a factory class responsible for creating and configuring dataset objects based on the provided parameters.Decorator Pattern
: wrap dataset objects with additional functionality.Facade Pattern
: encapsulates the complexity of dataset creation and provides a simplified interface to the client code.src/
│
├── datasets/
│ ├── __init__.py
│ ├── base_dataset.py
│ ├── huggingface_dataset.py
│ ├── json_dataset.py
│ ├── csv_dataset.py
│ └── ...
│
├── data_processors/
│ ├── __init__.py
│ ├── base_data_processor.py
│ ├── iapp_wiki_processor.py
│ ├── scb_translation_processor.py
│ ├── wisesight_sentiment_processor.py
│ └── ...
│
├── data_transformers/
│ ├── __init__.py
│ ├── base_data_transformer.py
│ ├── map_transformer.py
│ ├── filter_transformer.py
│ ├── rename_columns_transformer.py
│ └── ...
│
├── flan_creator/
│ ├── __init__.py
│ ├── flan_creator_base.py
│ └── flan_creator.py
│
├── utils/
│ ├── __init__.py
│ └── ...
│
├── requirements.txt
│
├── tests/
│
└── main.py
datasets/
: Contains classes for handling different types of datasets.
base_dataset.py
: Defines the base class for datasets.huggingface_dataset.py
: Implements the dataset class for HuggingFace datasets.json_dataset.py
: Implements the dataset class for JSON datasets.csv_dataset.py
: Implements the dataset class for CSV datasets.data_processors/
: Contains classes for processing data.
base_data_processor.py
: Defines the base class for data processors.iapp_wiki_processor.py
: Implements the data processor for IAPP Wiki data.scb_translation_processor.py
: Implements the data processor for SCB translation data.wisesight_sentiment_processor.py
: Implements the data processor for Wisesight sentiment data.data_transformers/
: Contains classes for transforming data.
base_data_transformer.py
: Defines the base class for data transformers.map_transformer.py
: Implements the data transformer for mapping data.filter_transformer.py
: Implements the data transformer for filtering data.rename_columns_transformer.py
: Implements the data transformer for renaming columns.flan_creator/
: Contains classes for creating the FLAN dataset.
flan_creator_base.py
: Defines the base class for the FLAN dataset creator.flan_creator.py
: Implements the main FLAN dataset creator class.utils/
: Contains utility modules and functions.
main.py
: The entry point of the FLAN dataset creation script.
Requirements
--all
เพื่อเลือกใช้ชุดข้อมูลทั้งหมดที่มีแทนที่จะเลือกเฉพาะบางชุด