imdbsd / collecttoon

weekend stuff
0 stars 0 forks source link

Feat: Chapter Scrapper #4

Closed imdbsd closed 3 years ago

imdbsd commented 3 years ago

This scrapping service will scrap all comic page from selected chapter in order. Example chapter can be found (https://www.webtoon.xyz/read/secret-class/chapter-1/)[here]. All the chapter was ended in subpath /:chapter-[number]

Chapter content can be found on the page with query selector like example below:

document.querySelectorAll('.reading-content > .page-break > img')

it will return all the content in img element, located in data-src attribute.

Published date can be found under .yoast-schema-graph element

Chapter number can be found in element .chapters_selectbox_holder

The scrapped data should be like this:

type Content = {
  page: number
  content: string
}

type Chapter = {
  chapter: number
  publishDate: string
  contents: Content[]
}