mwpenny / kijiji-scraper

A lightweight node.js module for retrieving and scraping ads from Kijiji
MIT License
96 stars 44 forks source link

can't seem to get this working #7

Closed vesper8 closed 7 years ago

vesper8 commented 7 years ago

I'm trying to add this library to a laravel project

I ran npm install kijiji-scraper

then I add the example code to my app.js:

var kijiji = require("kijiji-scraper")

var prefs = {
    "locationId": 27,
    "categoryId": 1700185
}

var params = {
    "minPrice": 0,
    "maxPrice": 100000,
    "keywords": "toyota",
    "adType": "OFFER"
}

kijiji.query(prefs, params, function(err, ads) {
    //Use the ads array
    console.log(ads);
});

then when I do npm run watch I get these errors:

 ERROR  Failed to compile with 7 errors                                                                                                                                   21:54:15

These dependencies were not found:

* fs in ./~/kijiji-scraper/~/request/lib/har.js
* net in ./~/forever-agent/index.js, ./~/tough-cookie/lib/cookie.js and 1 other
* tls in ./~/forever-agent/index.js, ./~/kijiji-scraper/~/tunnel-agent/index.js

To install them, you can run: npm install --save fs net tls

This relative module was not found:

* ./package in ./~/cheerio/index.js

I did try installing all these dependencies manually although I don't think I should need to since they are dependencies of the kijiji-scraper itself

but even after adding these I still get these errors:

 ERROR  Failed to compile with 2 errors                                                                                                                                   21:49:11

This dependency was not found:

* fs in ./~/kijiji-scraper/~/request/lib/har.js

To install it, you can run: npm install --save fs

This relative module was not found:

* ./package in ./~/kijiji-scraper/~/cheerio/index.js

any help would be appreciated

mwpenny commented 7 years ago

First, thank you for taking the time to write out a detailed description with code snippets and error messages. One thing to note though is that you have your categoryId and locationId switched. Your prefs object should instead be defined as:

var prefs = {
    "locationId": 1700185,
    "categoryId": 27
}

if you have an incorrect locationId or categoryId, the ads array will be empty. However, it looks like you're having different problems.


On to your issue:

I am not familiar with Laravel. What type of directory structure do you have (I presume you are running npm run from your project's top-level directory)? Also, which script from your package.json's scripts object are you trying to run, and could you post it here? The problem is likely with your configuration.

On my test machine, I am able to run (where app.js is the sample code you posted with my changes to the prefs object mentioned above)

$ npm install kijiji-scraper
$ node app.js

and the script prints the array of search results, as expected.

vesper8 commented 7 years ago

thanks for trying to help!

I am unfortunately not all that familiar with running pure node apps. I actually tried to set it up with https://github.com/sahat/hackathon-starter but I was completely at a loss on where to even add the kijiji-scraper code

I would really like to get it to work in my laravel project because I would then be using laravel to store and analyze the scraped kijiji results and trigger notifications and other stuff from Laravel.

You are correct that I run 'npm run production or npm run watch' from the root directory.

The app.js is located in /resources/assets/js/

Is it possible this is causing an issue because your package is using relative paths?

Here's my package.json and and webpack.mix.js (used by Laravel Mix, similar to gruntfile)

{
  "private": true,
  "scripts": {
    "dev": "cross-env NODE_ENV=development node_modules/webpack/bin/webpack.js --progress --hide-modules --config=node_modules/laravel-mix/setup/webpack.config.js",
    "watch": "cross-env NODE_ENV=development node_modules/webpack/bin/webpack.js --watch --progress --hide-modules --config=node_modules/laravel-mix/setup/webpack.config.js",
    "hot": "cross-env NODE_ENV=development node_modules/webpack-dev-server/bin/webpack-dev-server.js --inline --hot --config=node_modules/laravel-mix/setup/webpack.config.js",
    "production": "cross-env NODE_ENV=production node_modules/webpack/bin/webpack.js --progress --hide-modules --config=node_modules/laravel-mix/setup/webpack.config.js"
  },
  "dependencies": {
    "axios": "^0.15.2",
    "bootstrap": "^3.0.0",
    "cross-env": "^3.2.3",
    "jquery": "^2.1.4",
    "js-cookie": "^2.1.0",
    "kijiji-scraper": "^2.0.0",
    "laravel-mix": "0.*",
    "moment": "^2.10.6",
    "promise": "^7.1.1",
    "sweetalert": "^1.1.3",
    "underscore": "^1.8.3",
    "urijs": "^1.17.0",
    "vue": "2.*"
  }
}
let mix = require('laravel-mix');
var path = require('path');

/*
 |--------------------------------------------------------------------------
 | Mix Asset Management
 |--------------------------------------------------------------------------
 |
 | Mix provides a clean, fluent API for defining some Webpack build steps
 | for your Laravel application. By default, we are compiling the Sass
 | file for the application as well as bundling up all the JS files.
 |
 */

mix.less('resources/assets/less/app.less', 'public/css')
   .copy('node_modules/sweetalert/dist/sweetalert.min.js', 'public/js/sweetalert.min.js')
   .copy('node_modules/sweetalert/dist/sweetalert.css', 'public/css/sweetalert.css')
   .js('resources/assets/js/app.js', 'public/js')
   .webpackConfig({
        resolve: {
            modules: [
                path.resolve(__dirname, 'vendor/laravel/spark/resources/assets/js'),
                'node_modules'
            ],
            alias: {
                'vue$': 'vue/dist/vue.js'
            }
        }
   });

I've added the code inside the app.js

any ideas?

mwpenny commented 7 years ago

Ah, okay, I think I see what's wrong. I've looked into Laravel a little bit and it is a server-side technology (PHP). It looks like you are attempting to run kijiji-scraper on the client-side (you are putting the code that uses it in the public/js directory). Also notice that all of the dependencies in your package.json are client-side JS libraries (i.e., for use in the browser - not on the server; except laravel-mix, which is just used to compile your assets into one bundle, if I'm not mistaken).

The scraper must run on the server-side under Node.js, unless you use something like Browserify, which Laravel happens to support through its "Elixir" (see its documentation, specifically the Browserify section). Using this on your app.js (or another script with uses the scraper), you should be able to use kijiji-scraper in the browser.

vesper8 commented 7 years ago

again huge thanks for explaining!

i don't use elixir.. no one does anymore.. it might be possible to do what you say with Laravel Mix.. but maybe you're just not supposed to add server-side js the way I was tyring to do it.

So instead I made a blank app.js in the root and added the code there and indeed after doing node app.js it just works!

I am now having my Laravel application call the node script, grab the output and then process it. But I'm running into another issue because of invalid json which I opened another issue about

It would also be useful if there was a way of getting more or less than the default 20 entries

And what about other parameters such as radius ?

Thanks!!

mwpenny commented 7 years ago

Good to hear you found a workaround. I'll close this issue.

If you're using the same script as in your original post (or a similar one) to output the ad information, then the JSON issue is not with the library. I posted more information over in #8.

The query/search functionality uses Kijii's RSS feed functionality, which returns 20 entries per page. Paging is currently not implemented in the scraper. The RSS feed seems to support it (e.g., https://www.kijiji.ca/rss-srp-ottawa/cars/page-2/k0l1700185) but using it in a nice way (without using strings like "cars" and "Ottawa" in place of locationId and categoryId) was flaky when I played around with it just now. I'd need to research it more, or switch over to scraping the HTML search results page. That page has trivial paging via the bar at the bottom but the scraper is more likely to break if it is used since the HTML layout can change when Kijiji updates their UI (this recently happened with ad pages, see issue #6). In the mean time, PRs are welcome.

The way I find other search parameters is by performing a search on Kijiji and using my browser's developer tools (F12 key). The "network" tab shows network requests and what parameters are sent. Playing around with different search parameters and observing the requests made to Kijiji can reveal their names. Sidenote though: it looks like Kijiji has changed the way they do parameters on their site. For example, after setting the radius to 100km, the URL for the search results page had r100.0 appended to the query string instead of as a POST parameter. The old method (used by this library) of using custom parameters should still work though. Please let me know if you experience problems with it.