macks22 / dblp

Parse the dblp data into a structured format for experimentation.
MIT License
73 stars 22 forks source link

Failed scheduling due to utils.py 'basestring' is not defined? #17

Closed trzytematyczna closed 7 years ago

trzytematyczna commented 7 years ago

Hi, I am trying to use your parser, and while running the command pipeline admin$ python pipeline.py BuildDataset --start 2000 --end 2001 --local-scheduler I am getting the error, which is connected to " NameError: name 'basestring' is not defined" in the utils.py. I looked at the code, but tbh I struggle to get what the variable basestring suppose to be. Any indications what can I check or how can I solve this much appreciated!

Error:

DEBUG: Checking if BuildDataset(start=2000, end=2001) is complete
/Users/admin/anaconda/lib/python3.5/site-packages/luigi/worker.py:328: UserWarning: Task BuildDataset(start=2000, end=2001) without outputs has no custom complete() method
  is_complete = task.complete()
DEBUG: Checking if BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2001) is complete
INFO: Informed scheduler that task   BuildDataset_2001_2000_429339e3d6   has status   PENDING
WARNING: Will not run BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2001) or any dependencies due to error in complete() method:
Traceback (most recent call last):
  File "/Users/admin/anaconda/lib/python3.5/site-packages/luigi/worker.py", line 328, in check_complete
    is_complete = task.complete()
  File "/Users/admin/anaconda/lib/python3.5/site-packages/luigi/task.py", line 533, in complete
    outputs = flatten(self.output())
  File "/Users/admin/Desktop/DBLP_parser/dblp-master/pipeline/util.py", line 39, in output
    if isinstance(self.base_paths, basestring):
NameError: name 'basestring' is not defined

INFO: Informed scheduler that task   BuildLCCAuthorRepdocCorpusTfidf_2001_2000_429339e3d6   has status   UNKNOWN
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
DEBUG: There are 1 pending tasks possibly being run by other workers
DEBUG: There are 1 pending tasks unique to this worker
DEBUG: There are 1 pending tasks last scheduled by this worker
INFO: Worker Worker(salt=038425231, workers=1, host=Monikas-MacBook-Pro.local, username=admin, pid=75675) was stopped. Shutting down Keep-Alive thread
INFO: 
===== Luigi Execution Summary =====

Scheduled 2 tasks of which:
* 1 failed scheduling:
    - 1 BuildLCCAuthorRepdocCorpusTfidf(start=2000, end=2001)
* 1 were left pending, among these:
    * 1 had dependencies whose scheduling failed:
        - 1 BuildDataset(start=2000, end=2001)

Did not run any tasks
This progress looks :( because there were tasks whose scheduling failed

===== Luigi Execution Summary =====
macks22 commented 7 years ago

This is due to incompatibility with Python 3. The basestring type was a common base type for str and unicode that was removed in Python 3, when the default str type was changed to be unicode. The easiest fix for this is to run it with Python 2.

If you'd like, you can replace that line with if isinstance(self.base_paths, str):. There may be other incompatibilities with Python 3 after this though. If you're up to it, I'm glad to merge in a PR that gives full compatibility. You can try using the 2to3 tool to help.

macks22 commented 7 years ago

Closing due to inactivity.