python-bonobo / bonobo

Extract Transform Load for Python 3.5+
https://www.bonobo-project.org/
Apache License 2.0
1.58k stars 143 forks source link

CsvReader does not accept absolute path names in Windows #398

Open alaindebecker opened 3 years ago

alaindebecker commented 3 years ago

bonobo.CsvReader('C:Users/alain/Desktop/projects/pyetl/Employees.txt') produce en error, while bonobo.CsvReader('Employees.txt') does not.

The error is an fs.errors.InvalidCharsInPath path 'C\:Users/alain/Desktop/projects/pyetl/Employees.txt' contains invalid characters because of the filename contains ":".

The error is generated on line 56 of nbonobo\nodes\io\base.py", which is strange because this script on github does not contain a line 56, it stops after 46 lines.

Traceback (most recent call last):
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\bonobo\execution\strategies\executor.py", line 54, in _runner with node:
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\bonobo\execution\contexts\base.py", line 73, in __enter__
│     self.start()
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\bonobo\execution\contexts\node.py", line 85, in start
│     self._stack.setup(self)
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\bonobo\config\processors.py", line 124, in setup
│     _append_to_context = next(_processed)
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\bonobo\nodes\io\base.py", line 52, in file
│     with self.open(fs) as file:
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\bonobo\nodes\io\base.py", line 56, in open
│     return fs.open(self.path, self.mode, encoding=self.encoding)
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\fs\osfs.py", line 631, in ope
│     _path = self.validatepath(path)
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\fs\osfs.py", line 678, in validatepath
│     return super(OSFS, self).validatepath(path)
│   File "C:\Users\alain\AppData\Local\Programs\Python\Python39\lib\site-packages\fs\base.py", line 1489, in validatepath
│     raise errors.InvalidCharsInPath(path)
╰ fs.errors.InvalidCharsInPath  path 'C:Users/alain/Desktop/projects/pyetl/Employees.txt' contains invalid characters
deepu9 commented 3 years ago

@alaindebecker try using Path from pathlib package. Something like Path.cwd().

alaindebecker commented 3 years ago

Hi deepu9, Thank for the quick answer.

I do not see how to use Path.cwd(), which gives me the current working directory.

I did try to wrap to warp the string in a Path, which is fine for Pathin it self, but still produces the same error (on a a still no existing line 56 in bonobo/nodes/io/base.py).

My minimal reproductible example goes as follow:

myFile = <<any valid csv file of yours>>
print(Path(myFile)) # Check if VALID
graph = bonobo.Graph()
graph.add_chain(
    bonobo.CsvReader(Path(myFile)) # FAULTY
    )
bonobo.run(graph)

FYI : According to your website, I insalled bonobo with pip , after which bonobo version tells me bonobo v.0.6.4.

klmcwhirter commented 3 years ago

I think this may be the hint. Look at the last error message. There is a missing / after C: ...

'C:Users/alain/Desktop/projects/pyetl/Employees.txt'

On Sun, Mar 21, 2021 at 6:26 AM Alain Debecker @.***> wrote:

Hi deepu9, Thank for the quick answer.

I do not see how to use Path.cwd(), which gives me the current working directory.

I did try to wrap to warp the string in a Path, which is fine for Path in it self, but still produces the same error (on a a still no existing line 56 in bonobo/nodes/io/base.py).

My minimal reproductible example goes as follow:

myFile = # Change by any valid csv file of yours print(Path(myFile)) # This is OK graph = bonobo.Graph() graph.add_chain( bonobo.CsvReader(Path(myFile)) # Faulty ccode ) bonobo.run(graph)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python-bonobo/bonobo/issues/398#issuecomment-803579691, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKFJ3SY4R7DE6GGVGID6OTTEXXZHANCNFSM4ZQRWIIA .

deepu9 commented 3 years ago

Hi deepu9, Thank for the quick answer.

I do not see how to use Path.cwd(), which gives me the current working directory.

I did try to wrap to warp the string in a Path, which is fine for Pathin it self, but still produces the same error (on a a still no existing line 56 in bonobo/nodes/io/base.py).

My minimal reproductible example goes as follow:

myFile = <<any valid csv file of yours>>
print(Path(myFile)) # Check if VALID
graph = bonobo.Graph()
graph.add_chain(
    bonobo.CsvReader(Path(myFile)) # FAULTY
    )
bonobo.run(graph)

FYI : According to your website, I insalled bonobo with pip , after which bonobo version tells me bonobo v.0.6.4.

@alaindebecker When you use Path.cwd(), it gives current working directory. Anything that comes after the working directory should be appended by using joinpath().

Say your file path is C:\Users\alain\Desktop\projects\pyetl\Employees.txt. When you use Path.cwd(), it only gives you C:\Users\alain\Desktop\projects\pyetl, because that's your project root directory, where your code is being run. Now to make the other parts, use joinpath(). So the final code will be:

Path.cwd().joinpath('Employees.txt')

Say your file path is C:\Users\alain\Desktop\projects\pyetl\subfolder1\subfolder1.2\Employees.txt, then your code should be:

Path.cwd().joinpath('subfolder1', 'subfolder1.2', 'Employees.txt')

alaindebecker commented 3 years ago

Still not.

myFile = Path.cwd().joinpath('..', 'Employees.txt')
print('File name:', myFile)
print('File exists:', myFile.is_file())
graph = bonobo.Graph()
graph.add_chain(bonobo.CsvReader(myFile))
bonobo.run(graph)

However, this time, the "/" after the "C:" was not erased, but all the "/" where converted to "\" and file name got truncated.

File name: C:\Users\alain\Desktop\projects\pyetl\..\Employees.txt
File exists: Truefs.errors.InvalidCharsInPath  path 'C:\Users\alain\Desktop\projects\ClassicModels\datafiles' contains invalid characters
deepu9 commented 3 years ago

Still not.

myFile = Path.cwd().joinpath('..', 'Employees.txt')
print('File name:', myFile)
print('File exists:', myFile.is_file())
graph = bonobo.Graph()
graph.add_chain(bonobo.CsvReader(myFile))
bonobo.run(graph)

However, this time, the "/" after the "C:" was not erased, but all the "/" where converted to "\" and file name got truncated.

File name: C:\Users\alain\Desktop\projects\pyetl\..\Employees.txt
File exists: Truefs.errors.InvalidCharsInPath  path 'C:\Users\alain\Desktop\projects\ClassicModels\datafiles' contains invalid characters

@alaindebecker Don't worry about forward or backward slashes, as the pathlib will take care of them. Also, I've noticed .. in your code. Not sure whether you want to hide subfolders from public view or use the .. for relative path. If it's the later, then can you remove the .. from joinpath(). Can you confirm.

I've created same folder structure and it works for me. Thanks

deepu9 commented 3 years ago

@alaindebecker Did you get a chance to re-check your code? Can you let me know how did it go. Thanks