mytnguyen26 / METCS777-GenAIForTheme

This repository is a project for METCS777. This project focuses on fine-tuning Gen AI models for theme specific content
1 stars 2 forks source link

[data] Collect data from open source sites #1

Open mytnguyen26 opened 7 months ago

mytnguyen26 commented 7 months ago

We will collect as much data as we could from Chinese sites, Vietnamese sites, or any English open sourced sites. Material includes:

Let's save this unprocessed data in a ShareDrive (TBD) in .txt format, organized by types (music, short stories, arts)

mytnguyen26 commented 7 months ago

https://www.kaggle.com/datasets/carlosgdcj/genius-song-lyrics-with-language-information

mytnguyen26 commented 7 months ago

https://www.kaggle.com/datasets/rickyjli/chinese-fine-art