tidyverse / rvest

Simple web scraping for R
https://rvest.tidyverse.org
Other
1.49k stars 341 forks source link

Invalid Char in Json Text #400

Open OlexiyPukhov opened 6 months ago

OlexiyPukhov commented 6 months ago

Continuation of #394

Can install, but now get another error...

Slice of my github actions:

name: R Daily Workflow

on:
  schedule:
    - cron: '36 21 * * 5' # This schedules the action to run daily at 9:36pm UTC which is 4:36pm EST
  workflow_dispatch:

jobs:
  run-r-scripts:
    runs-on: ubuntu-latest
    steps:
      # Check out the code from the repo
      - name: Check out code
        uses: actions/checkout@v4

      # Set up R environment
      - name: Setup R
        uses: r-lib/actions/setup-r@v2

      # Download and set up Chrome
      - name: Download and set up Chrome
        run: |
          wget https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/119.0.6045.105/linux64/chrome-linux64.zip
          unzip chrome-linux64.zip
          sudo mv chrome-linux64/chrome /opt/google/chrome/chrome

      # Verify Chrome Installation
      - name: Verify Chrome Installation
        run: google-chrome-stable --version

      - name: Install R packages with pak
        run: |
          Rscript -e "install.packages('pak', repos = 'https://r-lib.github.io/p/pak/stable/')"
          Rscript -e "pak::pak_setup()"
          Rscript -e "pak::pkg_install(c('dplyr', 'openxlsx', 'stringr', 'rvest', 'chromote'))"
## next, running the script is actions... 

library(dplyr)
library(stringr)
library(openxlsx)
library(rvest)

link = "https://innovation.ised-isde.canada.ca/s/group-groupe?language=en_CA&token=a0BOG000000Y7oh2AC"

sess <- read_html_live(link)
## get this error in github actions, but not locally on my windows pc
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
    filter, lag
The following objects are masked from ‘package:base’:
    intersect, setdiff, setequal, union
Error in parse_con(txt, bigint_as_char) : 
  lexical error: invalid char in json text.
                                       import{getTrustedHTML}from"//re
                     (right here) ------^
Calls: read_html_live ... <Anonymous> -> parse_and_simplify -> parseJSON -> parse_con
Execution halted
Error: Process completed with exit code 1.