muddy-28 / tfidf

Automatically exported from code.google.com/p/tfidf
0 stars 0 forks source link

Error with "Reads "term:frequency" from each subsequent line in the file" part of code #1

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. A term such as the following
<a href="http: /www.pamil-visions.net/author/laura/" title="posts by laura
spencer">

What is the expected output? 
In the line 
frequency = int(tokens[1].strip())
frequency should return a numner
What do you see instead?
ValueError: invalid literal for int() with base 10:
'/www.pamil-visions.net/author/laura/" title="posts by laura spencer">'

On what operating system?
Windows vista

I think to correct this you can do the following:
      # Reads "term:frequency" from each subsequent line in the file.
      for line in corpus_file:
        tokens = line.rpartition(":")
        term = tokens[0].strip()        
        frequency = int(tokens[2].strip())
        self.term_num_docs[term] = frequency

Original issue reported on code.google.com by jsaucedo@gmail.com on 23 Aug 2009 at 7:13

GoogleCodeExporter commented 9 years ago
Thank you for pointing this out and suggesting a fix.  I've taken the fix, and 
it's in 
version 1.1.  Thanks!

Original comment by nini...@gmail.com on 19 Jan 2010 at 10:25

GoogleCodeExporter commented 9 years ago
www.sbh.h-gz.com/vb/

Original comment by al33al...@gmail.com on 6 Oct 2010 at 4:55

GoogleCodeExporter commented 9 years ago
<?xml version="1.0" encoding="UTF-8"?>
<!--  This file is a ROR Sitemap for describing this website to the search 
engines. For details about the ROR format, go to www.rorweb.com.   -->
<rss version="2.0" xmlns:ror="http://rorweb.com/0.1/" >
<channel>
  <title>ROR Sitemap for http://www_sbh.h-gz.com/vb/</title>
  <link>http://www_sbh.h-gz.com/vb/</link>
  <item>
    <title>ROR Sitemap for http://www_sbh.h-gz.com/vb/</title>
    <link>http://www_sbh.h-gz.com/vb/</link>
    <ror:about>sitemap</ror:about>
    <ror:type>SiteMap</ror:type>
  </item>
  <item>
     <link>http://www_sbh.h-gz.com/vb/</link>
     <ror:updatePeriod>week</ror:updatePeriod>
     <ror:sortOrder>0</ror:sortOrder>
     <ror:resourceOf>sitemap</ror:resourceOf>
  </item>
</channel>
</rss>

Original comment by al33al...@gmail.com on 6 Oct 2010 at 4:56