ubiquity / scraper-kernel

A Puppeteer-based scraping platform with modular, page-level scraping logic.
0 stars 3 forks source link

Host on GitHub Actions #7

Closed 0x4007 closed 1 year ago

0x4007 commented 1 year ago

Once #1 is solved, we should compile and host on GitHub Actions, and allow invocation from external web applications. This will be directly useful for our dapp UI tests, as well as our Bounty program registration (sign in with GitHub, then we serverlessly scrape your public GitHub profile and register you in our database.)

0x4007 commented 1 year ago

I think this could take some time because there are technically two challenges to consider:

  1. Get this to work on GitHub Actions (in my first experience of using @vercel/ncc it was pretty straightforward to compile the @ubiquity/bounty-bot and host on GitHub Actions. But you still need to create the .yml file etc. Please refer to the config in the bounty-bot repository!
  2. Figure out how to invoke this from an external webapp so that we can have it hosted serverlessly (and for free!) once this code is produced, we'll immediately be able to apply it in our devtooling efforts https://github.com/ubiquity/ubiquity-dollar/issues/209
0x4007 commented 1 year ago

Can you write me a working example of a GitHub action that can be externally invoked via an api to scrape a web page for data This content may violate our content policy. If you believe this to be in error, please submit your feedback — your input will aid our research in this area.

Sure, here is an example of a GitHub action that can be externally invoked via an API to scrape a web page for data:

name: Scrape Web Page

on:
  # Trigger the workflow when the API is called
  api:
    # Set the endpoint for the API
    endpoint: scrape

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      # Checkout the repository
      - uses: actions/checkout@v2

      # Install the required dependencies
      - uses: actions/setup-node@v1
        with:
          node-version: 12

      # Install the scraping library
      - run: npm install web-scraper

      # Run the scraping script
      - run: node scrape.js https://www.example.com

In this example, the action is triggered when the API is called at the scrape endpoint. The action then installs the required dependencies and runs a scraping script (scrape.js) using the web-scraper library. The script takes a URL as an argument and scrapes the page for data. You can modify the script and the scraping library to fit your specific needs.