jprichardson / node-google

A Node.js module to search and scrape Google.
MIT License
454 stars 115 forks source link

Empty result (News for ...) #32

Open dogancelik opened 8 years ago

dogancelik commented 8 years ago

When you search for hello lyrics, you get this empty result.

{ title: 'News for hello lyrics',
  link: null,
  description: '',
  href: null }

news

genbtc commented 8 years ago

This script considers the entire "Top Card" (in this case "In The News") as the first result ( links[0] ). So links[0].link would be null because it cannot figure out how to parse all this. I have done two fixes to my own "bot" to handle this. Instead of taking the first result, take the second result. This skips the "news" section though. (In your case this is what you want, as the lyrics would be the first REAL result, not the news. Consider this example code as a template for your own program (not meant as modification inside this module's code)

        var i=0;
        while(links[i].link == null)
            i++;
        console.log(links[i].link);

The second fix I have discovered sort of needs its own "Issue" but i will include it here for you anyway. To handle a youtube video as the first result (probably other things too). This one does require modifying the module's code @ lib/google.js On Line 66 between var qsObj and if(qsObj : Add:

        //Handle YoutubeVideo as TopCard.
        if (!qsObj['/url?q'])
            qsObj = querystring.parse($(elem).find('a').attr('href'))
BTMPL commented 7 years ago

For anyone looking for a solution to this issue - I've made a fork that changes the item selector to always skip cards

https://github.com/BTMPL/node-google

the only affected line is

https://github.com/BTMPL/node-google/blob/master/lib/google.js#L8

3zzy commented 7 years ago

@BTMPL Thanks, but that still doesn't fix the issue.

Test Keyword Lenovo "00AE912"

genbtc commented 7 years ago

Works fine for me. Produces this result (which matches with my human google search): N2225 and N2226 12Gb SAS External HBAs > Lenovo Press https://lenovopress.com/tips1175-n2225-and-n2226-sas-sata-hbas

The N2225 and N2226 SAS/SATA HBAs are low-cost, high-performance host bus adapters for high-performance connectivity between System x® servers and tape drives and RAID storage systems. The N2225 provides two x4 external mini-SAS HD connectors with eight lanes of 12 Gbps SAS. The N2226 provides four x4 external mini-SAS HD connectors with 16 lanes of 12 Gbps SAS.

Make sure you are calling it somewhat like this (my example code):

function GoogleSearchPlugin () {
    this.google = require('google');
    this.google.resultsPerPage = 25
    var nextCounter = 0
};

GoogleSearchPlugin.prototype.respond = function (query, channel, bot) {
    this.google(query, function (err, response){
        if (err)
            channel.sendMessage("¯\\_(ツ)_/¯");

        var i=0;
        while(response.links[i].link == null)
            i++;
        channel.sendMessage(response.links[i].link);
    });
};

module.exports = GoogleSearchPlugin;

Actually this was adapted from v1.0 which I see now on the homepage theres example code to use the callback and next function better than I have done. Since I only need 1 result. But since mines not broke I wont fix it, I'll leave that up to you.

3zzy commented 7 years ago

Its not happening for almost all searches:

{ title: 'ServeRAID M5225-2GB SAS/SATA Controller > Lenovo Press',
  href: null,
  description: '' }
{ title: 'ServeRAID M5225-2GB SAS/SATA Controller - Lenovo Press',
  href: null,
  description: '' }
{ title: 'PROVANTAGE: Lenovo 00AE938 Serveraid M5225-2GB SAS SATA ...',
  href: null,
  description: '' }
{ title: 'Related/Similar - Provantage',
  href: null,
  description: '' }
{ title: 'Lenovo Serve Raid M5225-2Gb Sas/Sata Controller 00AE938 - CPL',
  href: null,
  description: '' }
{ title: 'Lenovo ServeRAID M5225-2GB - storage controller (RAID) - SATA ...',
  href: null,
  description: '' }
{ title: 'Amazon.com: Lenovo ServeRAID M5225-2GB SAS/SATA Controller ...',
  href: null,
  description: '' }
{ title: 'LENOVO SERVERAID M5225-2GB SAS/ SATA CONTROLLER FOR ...',
  href: null,
  description: '' }
{ title: 'LENOVO 00AE938 | SERVERAID M5225-2GB SAS/SATA CONTROLLE',
  href: null,
  description: '' }
{ title: 'Lenovo 00AE938 Serveraid M5225-2gb Sas / Sata Controller For ...',
  href: null,
  description: '' }

I'm calling it like so:

var fs = require( "fs" ),
  google = require( 'google' ),
  MongoClient = require( 'mongodb' ).MongoClient,
  assert = require( 'assert' );

process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";
process.env.UV_THREADPOOL_SIZE = 256;

process.on( 'uncaughtException', function ( err ) {
  console.log( 'Caught exception: ' + err );
} );

google.protocol = 'http';
google.tld = 'com.au';
google.resultsPerPage = 10;
google.requestOptions = {
  proxy: 'http://username:password@myproxyprovider:8000',
  timeout: 15000,
  jar: true
}

google( 'my keyword here', function ( err, res ) {
  if ( err ) {
    console.error( err );
    return;
  } else {
    for ( i = 0; i < res.links.length; i++ ) {
      var obj = {
        title: res.links[ i ].title,
        href: res.links[ i ].href,
        description: res.links[ i ].description
      }
      console.log( obj );
    }
  }

} );
genbtc commented 7 years ago

I Guess your google is different in your country or with those request options. im sure you can figure it out.