lavrai / phpgsb

Automatically exported from code.google.com/p/phpgsb
Other
0 stars 0 forks source link

Method j_parseUrl crashing #24

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Save the attached file
2. Run the code below (with the appropriate constants filled in)

require_once 'phpgsb.class.php';

$url = file_get_contents('spam_url.txt');

$phpgsb = new phpGSB(DB_NAME, DB_USER, DB_PASS, 'localhost', false);
$phpgsb->apikey = GSB_API_KEY;
$using_lists = array();
$using_lists[]='googpub-phish-shavar';

$phpgsb->usinglists = $using_lists;
$phpgsb->pingfilepath = PING_FILE_PATH;

$is_phishing = ($phpgsb->doLookup($url)?1:0);
$phpgsb->close();

What is the expected output? What do you see instead?
Segmentation fault

What version of the product are you using? On what operating system?
Running version 0.2.2 of the phpgsb library on a CentOS 4.4 server and a Red 
Hat Enterprise Linux Server release 5.6 server

Please provide any additional information below.
The spam url in the attached file came from one of several very similar actual 
spam emails received on one of our email boxes over the last few days.

It appears that the regular expression in phpgsb::phpgsb::j_parseURL()() is 
what is crashing.  Thus, a work-around I added that prevented the segmentation 
fault is to add the following lines to the top of phpgsb::j_parseURL()

if(strlen($url) > 2000) {
  return parse_url($url);           
}

Original issue reported on code.google.com by mikegillis677 on 16 Jan 2012 at 6:35

Attachments:

GoogleCodeExporter commented 8 years ago
So the URL being given is exactly as you attached?
I'll look into the regular expression issue and will run some local benchmarks 
with it to find out where its looping, it should never cause a seg fault. 
A URL does have a length limit according to the  RFC specs so I'll also look 
into that (though I'm not sure how we'd handle it if a URL did exceed the 
limit). 

Original comment by cleaver....@gmail.com on 16 Jan 2012 at 9:20

GoogleCodeExporter commented 8 years ago
Yes, the URL was given exactly as attached.  The lack of any whitespace is what 
caused my preliminary regex that extracts all the URLs from a document to 
extract it all as one "URL".

Original comment by mikegillis677 on 16 Jan 2012 at 9:30

GoogleCodeExporter commented 8 years ago
Could you try on this new version please? I've completely rewritten the 
function to use a more lightweight (but just as effective) regex, it solved the 
seg. fault on my test rig. 
http://phpgsb.googlecode.com/svn/trunk/phpgsb.class.php

Diff: 
http://code.google.com/p/phpgsb/source/diff?spec=svn35&r=35&format=side&path=/tr
unk/phpgsb.class.php&old_path=/trunk/phpgsb.class.php&old=34

Original comment by cleaver....@gmail.com on 17 Jan 2012 at 12:50

GoogleCodeExporter commented 8 years ago
Thanks for the updated version.

Unfortunately, I'm still getting the segmentation fault on both of my systems.  
In both cases, the segmentation fault is occurring on the following line in the 
revised method:

preg_match($loose, $url, $match);

Original comment by mikegillis677 on 17 Jan 2012 at 5:23

GoogleCodeExporter commented 8 years ago
Before I forget... both of my systems are running php 5.1.6 (that's the version 
that we have support for on the hosted RHEL server).  Not sure if that's 
relevant to the debug process.

Original comment by mikegillis677 on 17 Jan 2012 at 5:45

GoogleCodeExporter commented 8 years ago
Thats a shame, could you email me a printout of your php info, I want to check 
a few of your settings (such as memory per instance etc).
Just put the following in a text file, save as phpinfo.php and run:
<?php phpinfo(); ?>

Meanwhile I'll try and run some benchmarks as seg faults are usually caused by 
high memory consumption. 

Original comment by cleaver....@gmail.com on 17 Jan 2012 at 6:41

GoogleCodeExporter commented 8 years ago
Oh and to get my email address just click my linked username above and it 
should be on that page.

Original comment by cleaver....@gmail.com on 17 Jan 2012 at 6:53

GoogleCodeExporter commented 8 years ago
Thanks for the info. I'm running on http://gsbtool.beaver6813.com/ulookup.php 
and the peak memory usage isn't actually very high ~3.29MB. How are you seeing 
the seg fault trigger, from CLI or browser? Trying to work out the next steps 
to try and recreate.

Original comment by cleaver....@gmail.com on 17 Jan 2012 at 8:36

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
@mike I'm still investigating this, I've been real busy so haven't much time to 
setup my test rig, but I haven't forgotten :)

Original comment by cleaver....@gmail.com on 22 Jan 2012 at 7:51