Unicode filenames are not treated as output files

GoogleCodeExporter commented 9 years ago

A trivial pipeline:

{{{
import os

from ruffus import *

@files("foo.1", "foo.2")
def mkfile1(a, b):
    with open(a) as src:
        with open(b, "wb") as dst:
            data = src.read()
            dst.write(data)
            dst.write("2\n")

@follows(mkfile1)
@files("foo.2", u"foo.3")
def mkfile2(a, b):
    with open(a) as src:
        with open(b, "wb") as dst:
            data = src.read()
            dst.write(data)
            dst.write("3\n")

# Touch a file.
if not os.path.exists("foo.1"):
    with open("foo.1", 'wb') as dst:
        dst.write("1\n")

pipeline_run(mkfile2, verbose=10)
}}}

For the makefile2 stage, because foo.3 is a unicode string it isn't treated as 
an output filename. Running this pipeline multiple times results in output 
like:

{{{
  Task = mkfile1
    All jobs up to date
  Task = mkfile2
    Needing update:
      Job = [foo.2 -> foo.3]
   job_parameter_generator BEGIN
   job_parameter_generator consider task = __main__.mkfile2
   job_parameter_generator task __main__.mkfile2 not in progress
   job_parameter_generator start task __main__.mkfile2 (parents completed)
Start Task = mkfile2

    Job = [foo.2 -> foo.3] Missing output file 
    incomplete tasks = __main__.mkfile2
    Job = [foo.2 -> foo.3] completed
Completed Task = mkfile2
   job_parameter_generator END
}}}

The diff is trivial for this case, but I reckon there are other places where 
similar behavior exists to the isinstance not using basestring.

{{{
Index: src/ruffus/task.py
==============================================
=====================
--- src/ruffus/task.py  (revision 229)
+++ src/ruffus/task.py  (working copy)
@@ -958,7 +958,7 @@
             # if single file name, return that
             if (do_not_expand_single_job_tasks and 
                 len(self.output_filenames) and 
-                isinstance(self.output_filenames[0], str)):
+                isinstance(self.output_filenames[0], basestr)):
                 return self.output_filenames
             # if it is flattened, might as well sort it
             return 
sorted(get_strings_in_nested_sequence(self.output_filenames))
}}}

Original issue reported on code.google.com by paul.jos...@gmail.com on 11 Dec 2009 at 7:33

GoogleCodeExporter commented 9 years ago

Apparently issues don't require the triple curly braces for code listings. Feel 
free to 
ignore those.

Original comment by paul.jos...@gmail.com on 11 Dec 2009 at 7:34

GoogleCodeExporter commented 9 years ago

Ok, that patch didn't quite fix it, so I just s/str/basestring/ in each place 
and that ended 
up fixing things. Not sure if there are any places that would be bad though. I 
didn't 
notice much in the way of unicode checking so I reckon not.

Original comment by paul.jos...@gmail.com on 11 Dec 2009 at 7:48

Attachments:

ruffus-issue-9.patch

GoogleCodeExporter commented 9 years ago

Made changes to developmental version basically replacing isinstance(, str) with
isinstance(,basestring) as you suggested 

Will be in version 2.08.

Thanks

Original comment by bunbu...@gmail.com on 22 Jan 2010 at 5:06

Changed state: Fixed

rescalante-lilly / ruffus

Unicode filenames are not treated as output files #9