zbryikt / ptt-crawler

crawl ptt articles from its website
35 stars 16 forks source link

ptt-crawler

crawl ptt articles from its website

usage:

scraping certain ptt board:

lsc crawler.ls <board-name>

All posts will be downloaded into data//post/ folder. There will also be a data//post-list.json to kepp track of your download history, so you can interrupt your download at any time and resume later.

categorize authors by title:

lsc cat.ls <board-name>

food.ls: example for fetching articles for article generation home-sale.ls: example for categorizing purpose of articles id-stat.ls: analyze users stand point. output to data//id-stat.json id-stat-show.ls: show users statistics, generate suspect.json.

LICENSE

all sources are licensed under MIT License. ( I used CC-BY-4.0 license before, but MIT License is better for code license. please refer to correspondent license according to the time you fork this project. )