ubiquity / scraper-kernel

A Puppeteer-based scraping platform with modular, page-level scraping logic.
0 stars 3 forks source link

Make Relative Path to `pages/` Robust. #8

Closed 0x4007 closed 1 year ago

0x4007 commented 1 year ago

Currently this kernel is to be imported into a parent project e.g. https://github.com/pavlovcik/scraper-parent-test/tree/main/src

In the parent project, it should include all of the "pages" logic (the logic that will be imported when the browser is currently on that page. https://github.com/pavlovcik/scraper-parent-test/tree/main/src/pages

Now to invoke this program, the parent project (or user via CLI) must pass in where the "pages" directory exists.

This is extremely brittle because it needs to resolve the full path, and know the difference between if it is invoked via node or tsx (tsx runs typescript natively)


Not sure if most robust implementation but a checklist could be:

  1. Detect if runtime is Node or TSX.
    • If Node, expect dist/ ??
    • If TSX, expect src/ ??
  2. Resolve relative path based on current working directory of shell when program is invoked (parse argv[0])
    • This is important because the TypeScript modules are nested deeply in other directories which quickly breaks the relative path.
0x4007 commented 1 year ago

Easy answer for now is to ONLY support tsx execution vs node.