Closed GoogleCodeExporter closed 9 years ago
Original comment by abpil...@gmail.com
on 6 Oct 2008 at 11:25
Original comment by abpil...@gmail.com
on 6 Oct 2008 at 11:25
It is 3.30 am here and I have not yet slept... keeping this for tomorrow !
Original comment by abpil...@gmail.com
on 11 Oct 2008 at 10:10
Wow. Get some sleep :) That is late.
I tried this config file but I got this error:
File
"/home/lucas/projects/harvestman-crawler/trunk/HarvestMan/harvestman/apps/spider
.py",
line 420, in init_config
self.get_options()
File
"/home/lucas/projects/harvestman-crawler/trunk/HarvestMan/harvestman/apps/appbas
e.py", line
81, in get_options
objects.config.get_program_options()
File
"/home/lucas/projects/harvestman-crawler/trunk/HarvestMan/harvestman/lib/config.
py",
line 1477, in get_program_options
res = self.parse_arguments()
File
"/home/lucas/projects/harvestman-crawler/trunk/HarvestMan/harvestman/lib/config.
py",
line 1034, in parse_arguments
if SUCCESS(self.check_value(option,value)): self.set_option_xml('cache_status',
self.process_value(value))
File
"/home/lucas/projects/harvestman-crawler/trunk/HarvestMan/harvestman/lib/config.
py",
line 721, in set_option_xml
self.assign_option(option_val, value)
File
"/home/lucas/projects/harvestman-crawler/trunk/HarvestMan/harvestman/lib/config.
py",
line 590, in assign_option
fval = (eval(typ))(value)
ValueError: invalid literal for int() with base 10: 'tmp/config-bug20.xml'
Somehow the name of the config file is passed in as an option variable?
Also checkin 148 has one unit test failing. Not sure if these are connected.
Thanks,
Lucas
Original comment by szybal...@gmail.com
on 12 Oct 2008 at 5:05
I am not seeing any error like this when trying with this config.xml . Also
there is
no unit test failing for me. Can you let me know the full command-line by which
you
ran the program ?
Original comment by abpil...@gmail.com
on 12 Oct 2008 at 6:44
The previous comment was reply for Lukasz's comment, not for the original bug.
Lukasz, please reply.
For the original bug, I could not reproduce it in my Ubuntu 8.04, i686, Python
2.5.2.
After the fix for issue #21, it looks like the encoding issues are fixed.
I could not test it in x86_64 since I dont have a 64 bit Linux to test on.
Andrei,
could you check it again on your system with latest code from the trunk ?
Marking this as "Worksforme".
Original comment by abpil...@gmail.com
on 12 Oct 2008 at 7:22
Original comment by abpil...@gmail.com
on 12 Oct 2008 at 7:24
Original comment by abpil...@gmail.com
on 12 Oct 2008 at 7:25
Original comment by abpil...@gmail.com
on 12 Oct 2008 at 7:25
My feeling is that this is a "random" bug. It happens in HarvestMan since it
uses
many threads and they sometimes can produce "chaotic" bugs, which are difficult
to
reproduce. Let me know if this is a repeating bug for you, I will test it
further.
Original comment by abpil...@gmail.com
on 12 Oct 2008 at 7:26
Did an svn update and python setup.py install but got the following errors when
tested on the harvestman --selftest:
harvestman --selftest
Traceback (most recent call last):
File "/usr/bin/harvestman", line 8, in <module>
load_entry_point('HarvestMan==2.0.3dev-r156', 'console_scripts', 'harvestman')()
File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 277, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 2179, in
load_entry_point
return ep.load()
File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 1912, in load
entry = __import__(self.module_name, globals(),globals(), ['__name__'])
File
"/usr/lib/python2.5/site-packages/HarvestMan-2.0.3dev_r156-py2.5.egg/harvestman/
apps/spider.py",
line 92, in <module>
from harvestman.lib.event import HarvestManEvent
File
"/usr/lib/python2.5/site-packages/HarvestMan-2.0.3dev_r156-py2.5.egg/harvestman/
apps/harvestman.py",
line 90, in <module>
from event import HarvestManEvent
ImportError: No module named event
Original comment by andrei.p...@gmail.com
on 12 Oct 2008 at 4:02
havestaman.py is no longer in the repository. It was replaced by spider.py.
Please check your installation.
You could for example
cd havestman where you have a folder like lib,apps etc. and do
rm -r ./*
cd ../
svn update --force
be careful with the rm -r...
Try again then.
Lucas
Original comment by szybal...@gmail.com
on 12 Oct 2008 at 4:09
removed my repo, did the checkout + install. Went
into:/harvestman-crawler/HarvestMan/harvestman/apps and typed: python spider.py
-C
config-sample.xml
Output is:
Loading system configuration...
Loading user configuration...
Error assigning option "proxyport_value" => Error: invalid literal for int()
with
base 10: ''
Pass option -h for command line usage.
Printing error traceback for debugging...
File
"/usr/lib/python2.5/site-packages/HarvestMan-2.0.3dev_r156-py2.5.egg/harvestman/
lib/config.py",
line 697, in set_option_xml_attr
self.assign_option(option_val, value, attrs)
File
"/usr/lib/python2.5/site-packages/HarvestMan-2.0.3dev_r156-py2.5.egg/harvestman/
lib/config.py",
line 623, in assign_option
raise HarvestManConfigError, "Error: " + str(e)
Error: invalid literal for int() with base 10: ''
Original comment by andrei.p...@gmail.com
on 12 Oct 2008 at 4:19
cd /harvestman-crawler/HarvestMan/
python setup.py install
harvestman -c ./havestman/apps/config-sample.xml
I am getting it too. Will let you know as soon as we fix it.
Original comment by szybal...@gmail.com
on 13 Oct 2008 at 3:38
Strange, I am not getting this error. Maybe I am missing something ?
Lukasz, comment out the exception tracking code in assign_option (let it
raise the exception and die) and print out the variables (option_val, value,
attrs).
This will tell you which one is causing the problem.
Original comment by abpil...@gmail.com
on 13 Oct 2008 at 5:06
Guys, this is the problem. This is not coming from the config-sample.xml but
from
loading your user configuration from ~/.harvestman/config/config.xml. This is
the way
to fix it.
$ rm -rf ~/.harvestman
Then run harvestman again. I think basically you are having an old config.xml
file
copied there long time back which is conflicting with the current code.
Btw, there seems to be a problem in creating the crawl database in
~/.harvestman at
least on darwin (mac os x). So I fixed it in db.py in trunk. Sync the trunk, do
this
and let me know.
Thanks!
Original comment by abpil...@gmail.com
on 13 Oct 2008 at 5:19
It works now: both "python spider.py --selftest" and "python spider.py -C
config-sample.xml".
Original comment by andrei.p...@gmail.com
on 13 Oct 2008 at 7:57
Thanks for the quick verification andrei. Lukasz, I guess you don't need to
investigate this any more.
Original comment by abpil...@gmail.com
on 13 Oct 2008 at 8:00
Original comment by abpil...@gmail.com
on 11 Feb 2010 at 7:13
Original issue reported on code.google.com by
andrei.p...@gmail.com
on 22 Jul 2008 at 6:06Attachments: