indix / web-auto-extractor

Automatically extracts structured information from webpages
MIT License
108 stars 30 forks source link

Can't use web-auto-extractor with browserify #5

Closed tpeyrard closed 8 years ago

tpeyrard commented 8 years ago

Hello,

if I take the example in the README then run: browserify parser.js -o bundle.js It fails and writes only:


^
ParseError: Unexpected character ' '

I tried with another plugin to parse schema.org, it works with browserify, but their usage is horrible, as their doc :/

Any idea how to fix it?

Thanks

addnab commented 8 years ago

@tpeyrard Could you tell me exactly how you tried to bundle it?

Because I don't see it failing and it works completely fine when I try it.

This is the parser.js file I used. NOTE:- parser.js is part of a package that has web-auto-extractor installed

var WAE = require('web-auto-extractor').default
//ES6: import WAE from 'web-auto-extractor'

var wae = WAE()

var sampleHTML = "<div itemscope itemtype=\"http:\/\/schema.org\/Product\">\r\n  <span itemprop=\"brand\">ACME<\/span>\r\n  <span itemprop=\"name\">Executive Anvil<\/span>\r\n  <img itemprop=\"image\" src=\"anvil_executive.jpg\" alt=\"Executive Anvil logo\" \/>\r\n  <span itemprop=\"description\">Sleeker than ACME\'s Classic Anvil, the\r\n    Executive Anvil is perfect for the business traveler\r\n    looking for something to drop from a height.\r\n  <\/span>\r\n  Product #: <span itemprop=\"mpn\">925872<\/span>\r\n  <span itemprop=\"aggregateRating\" itemscope itemtype=\"http:\/\/schema.org\/AggregateRating\">\r\n    <span itemprop=\"ratingValue\">4.4<\/span> stars, based on <span itemprop=\"reviewCount\">89\r\n      <\/span> reviews\r\n  <\/span>\r\n \r\n  <span itemprop=\"offers\" itemscope itemtype=\"http:\/\/schema.org\/Offer\">\r\n    Regular price: $179.99\r\n    <meta itemprop=\"priceCurrency\" content=\"USD\" \/>\r\n    $<span itemprop=\"price\">119.99<\/span>\r\n    (Sale ends <time itemprop=\"priceValidUntil\" datetime=\"2020-11-05\">\r\n      5 November!<\/time>)\r\n    Available from: <span itemprop=\"seller\" itemscope itemtype=\"http:\/\/schema.org\/Organization\">\r\n                      <span itemprop=\"name\">Executive Objects<\/span>\r\n                    <\/span>\r\n    Condition: <link itemprop=\"itemCondition\" href=\"http:\/\/schema.org\/UsedCondition\"\/>Previously owned,\r\n      in excellent condition\r\n    <link itemprop=\"availability\" href=\"http:\/\/schema.org\/InStock\"\/>In stock! Order now!<\/span>\r\n  <\/span>\r\n<\/div>"

var parsed = wae.parse(sampleHTML)

document.body.innerHTML = JSON.stringify(parsed)

I then run browserify parser.js -o bundle.js

And here is the HTML where I import the bundle.js

<!DOCTYPE html>
<html>
 <head>
  <title>WAE TEST</title>
 </head>
 <body>
 </body>
 <script src="bundle.js"></script>
</html>

I opened the HTML in the browser and I'm able to see the correct output.

tpeyrard commented 8 years ago

Hello,

thanks for your quick feedback.

I retried with your code but it failed too. In fact I saw the error came from the file es6.object.create.js.

So after a npm install core-js (which is a dependency of web-auto-extractor), the problem was solved.

Thanks for your library :)

Thomas