mytnguyen26 / METCS777-GenAIForTheme

This repository is a project for METCS777. This project focuses on fine-tuning Gen AI models for theme specific content
1 stars 2 forks source link

[data] collect data thru scraping #2

Open mytnguyen26 opened 5 months ago

mytnguyen26 commented 5 months ago

In case data not already organized and is Published in webpages, we need a scraping pipeline

For example:

http://vietnamthuquan.eu/Tho/ https://www.thivien.net/forum