File paths are os dependent

A-Fisk commented 5 years ago

The file paths read in from the spreadsheets are kept in strings and we keep running into problems of getting backslashes/forwardslashes as we keep changing from unix to windows.

Worth updating to read them in as pathlib objects?

paulbrodersen commented 5 years ago

Yeah, totally agree.

Unfortunately, I don't have much time this week as I am presenting at our lab meeting on Wednesday and I am off for my holiday Thursday morning for two weeks. Hence: do you want to make a pull request? Otherwise, I will get to it when I am back Aug 2.

I have started writing some skeleton code for a decorator so the body of the relevant functions can ideally remain unchanged. If you look at the latest commit, I have added to data I/O functions in example_pipeline/data_io.py the following decorator:

def _handle_file_path(func):
    def func_wrapper(file_path, *args, **kwargs):
        pathlib_object = _get_pathlib_object(file_path)
        output = func(pathlib_object, *args, **kwargs)
        return output
    func_wrapper.__name__ = func.__name__
    func_wrapper.__doc__ = func.__doc__
    return func_wrapper

def _get_pathlib_object(file_path):
    return file_path

_get_pathlib_object is obviously a dud (it just returns the string without doing anything to it). I would propose that _get_pathlib_object

converts the string to a pathlib Path,
checks that the path exist,
returns the sanitized path based on OS or, alternatively, the Path object (if that plays nice with all the import functions).

A-Fisk commented 5 years ago

I can have a look at putting something together later this week. Hopefully it won't be beyond me, I'll follow your steps and see how that goes.

This doesn't have any tests yet to see if it's working does it?

paulbrodersen commented 5 years ago

I have a couple of test scripts locally, but they are still in a mess and in no form to be published, and still need a lot of work to be made into a proper test suite. Most I/O functions are still missing, mostly because I have only been getting files from your lab and didn't really know what exceptions to test against.

In the meantime, example_pipeline/02_test_state_annotation.py is an ok poor man's integration test as the coverage is actually pretty good (apart from the other scripts in the example_pipeline). If 01_preprocess_signals.py passes and 02_test_state_annotation.py still gives the same accuracy values, everything is probably ok.

paulbrodersen commented 5 years ago

On a different note, scratch point 2 in my objectives for _get_pathlib_object. It will be counter-productive for functions that write data (as the path probably shouldn't exist) and for functions that read data, it would be better if those functions threw the error and not the generic wrapper.

So all that needs to go into the _get_pathlib_object is:

Convert the string to a pathlib Path.
Determine the OS (Windows/other is probably enough).
Convert pathlib object to a more appropriate Path class (e.g WindowsPath or whatever it is called).
Convert back to string in case some of the downstream functions don't play nice with a file handle.

The whole function should probably be renamed to _sanitize_file_path, coming to think of it.

A-Fisk commented 5 years ago

Re: testing. I've come across a few problems running my scripts because I've processed them slightly differently to other people. Not sure where the best place to record them is, open another issue for them?

Re: objectives. checking if path exists. How about checking if the path exists, if it doesn't at least check the parent exists so it can still write there?

paulbrodersen commented 5 years ago

Not sure where the best place to record them is, open another issue for them?

Yes, please.

How about checking if the path exists, if it doesn't at least check the parent exists so it can still write there?

Great idea. I should have thought of that.

paulbrodersen / somnotate

File paths are os dependent #1