zclfly / collective-intelligence-framework

Automatically exported from code.google.com/p/collective-intelligence-framework
0 stars 0 forks source link

clean-mx #109

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
$ /opt/cif/bin/cif_feedparser -c /opt/cif/etc/misc.cfg -f cleanmx -T medium -F 
-d
running 4 threads at /opt/cif/bin/cif_feedparser line 92.
:48150: parser error : CData section not finished
http://216.245.206.181/um/dois/vemxupameupau.ph
    <url><![CDATA[http://216.245.206.181/um/dois/vemxupameupau.php?]]></url>
                                                                   ^
:48150: parser error : PCDATA invalid Char value 15
    <url><![CDATA[http://216.245.206.181/um/dois/vemxupameupau.php?]]></url>
                                                                   ^
:48150: parser error : Sequence ']]>' not allowed in content
    <url><![CDATA[http://216.245.206.181/um/dois/vemxupameupau.php?]]></url>
                                                                    ^
:48150: parser error : internal error
    <url><![CDATA[http://216.245.206.181/um/dois/vemxupameupau.php?]]></url>
                                                                    ^
:48150: parser error : Extra content at the end of the document
    <url><![CDATA[http://216.245.206.181/um/dois/vemxupameupau.php?]]></url>

Original issue reported on code.google.com by saxjazm...@gmail.com on 11 Oct 2011 at 5:13

GoogleCodeExporter commented 9 years ago
You guys mind taking a look at:

http://support.clean-mx.de/clean-mx/viruses.php?id=1041250

I haven't had a chance to really dig in, but it appears to choke some XML 
parsers with a non-utf8 char.

<entry>
    <line>1666</line>
    <id>1041250</id>
    <first>1318147795</first>
    <last>0</last>
    <md5>0efe8f24881a63cc66725b5ece77d588</md5>
    <virustotal>http://www.virustotal.com/latest-report.html?resource=0efe8f24881a63cc66725b5ece77d588</virustotal>
    <vt_score>29/40 (72.5%)</vt_score>
    <scanner>avira</scanner>
    <virusname><![CDATA[PHP%2FPbot.A]]></virusname>
    <url><![CDATA[http://216.245.206.181/um/dois/vemxupameupau.php?^O]]></url>
    <recent>up</recent>
    <response>alive</response>
    <ip>216.245.206.181</ip>
    <as>AS46475</as>
    <review>216.245.206.181</review>
    <domain>216.245.206.181</domain>
    <country>US</country>
    <source>ARIN</source>
    <email>abuse@limestonenetworks.com</email>
    <inetnum>216.245.192.0 - 216.245.207.255</inetnum>
    <netname>LSN-DLLSTX-1</netname>
    <descr><![CDATA[Limestone Networks, Inc. LIMES-2 400 N. St. Paul Dallas TX 75201]]></descr>
    <ns1></ns1>
    <ns2></ns2>
    <ns3></ns3>
    <ns4></ns4>
    <ns5></ns5>
</entry>

I figured since it's wrapped in a CDATA tag, it should be no problem, but I 
don't think that's the case. I'm looking at what options I have on my end to 
work-around this, but anything you can build in to UTF-8 that url data when you 
generate the XML output would be helpful...

Original comment by saxjazm...@gmail.com on 11 Oct 2011 at 6:39

GoogleCodeExporter commented 9 years ago
http://code.google.com/p/collective-intelligence-framework/wiki/ServerBackup

Original comment by saxjazm...@gmail.com on 28 Oct 2011 at 9:18