indix / web-auto-extractor

Automatically extracts structured information from webpages
MIT License
108 stars 30 forks source link

Store JSON-LD parse errors #25

Open TheDahv opened 5 years ago

TheDahv commented 5 years ago

Web Auto Extractor logs and skips parse errors it encounters when working on JSON+LD. However, a program cannot react to messages written to console.

This change allows a developer to hook into parse errors and react to them if desired.

Fix #24

TheDahv commented 5 years ago

Interesting note about the failing tests: Travis runs tests against node v5.12. I wrote this code against v10.12.

It appears Node has different error messages:

  1) Web Auto Extractor when there are parse errors should save jsonld parse errors:
      AssertionError: expected [ Array(2) ] to deeply equal [ Array(2) ]
      + expected - actual
       [
      -  "Unexpected end of input"
      -  "Unexpected token '"
      +  "Unexpected end of JSON input"
      +  "Unexpected token ' in JSON at position 11"
       ]

I have a couple ideas, and I'm interested in your take:

floflock commented 5 years ago

@TheDahv who can merge this branch?

TheDahv commented 5 years ago

@floflock I don't know. I haven't been in contact with anyone from Indix.

TheDahv commented 5 years ago

@floflock oh I just read the other thread. Sounds like he wants us to work from a fork.

In that case, we have 2 options:

I suppose it comes down to which of the two of us wants to become a maintainer :/

floflock commented 5 years ago

@TheDahv it is up to you. :) Currently, I am ignoring those type of log spam. 😆

In my opinion, there is more to do: new esm syntax or typescript, more test cases, ...

raine commented 4 years ago

Is there a more actively maintained similar library?

raine commented 4 years ago

I've made a fork and merged the changes from some of the other forks: https://github.com/raine/web-auto-extractor

Published as @rane/web-auto-extractor to npm.