pr-omethe-us / PyKED

Python interface to the ChemKED database format
https://pr-omethe-us.github.io/PyKED/
BSD 3-Clause "New" or "Revised" License
15 stars 15 forks source link

csv file time history #115

Open jsantner opened 6 years ago

jsantner commented 6 years ago

Changes proposed in this pull request:

@pr-omethe-us/chemked

codecov[bot] commented 6 years ago

Codecov Report

Merging #115 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #115   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           4      4           
  Lines         966    987   +21     
  Branches      226    231    +5     
=====================================
+ Hits          966    987   +21
Impacted Files Coverage Δ
pyked/validation.py 100% <100%> (ø) :arrow_up:
pyked/chemked.py 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 57c936c...af3145c. Read the comment docs.

bryanwweber commented 6 years ago

@jsantner Why are these changes needed?

jsantner commented 6 years ago

@bryanwweber Previously, when loading a yaml file with a time history defined in a csv file, the validator gives this error message:

  File "c:\users\jsantne\documents\github\pyked\pyked\validation.py", line 255, in _validate_isvalid_history
    n_cols = len(value['values'][0])

KeyError: 0

You can see this on the Travis report after my second commit on this branch, where I had only added a test yaml file with time history defined in a csv file.

bryanwweber commented 6 years ago

Yes, that's exactly what I'm saying. I was using a script to read multiple yaml files in a complex directory structure within a database directory, and python was looking for the csv file in database, not in the folder with the yaml file.

OK, that's a case that we missed for sure. What if, rather than passing the directory name around, we turn the yaml_file into an instance of a Path object? Then we can find the path of the CSV file by doing yaml_file.stem/csv_filename. As of Python 3.6, the built-in open functions (which NumPy relies on) all accept instances of Path (see https://docs.python.org/3/library/functions.html#open), so we only really have to handle the Python 3.5 case, for which we can write a custom open function that just converts the filename to a string, something like

if sys.version < (3,6):
    oldopen = open
    def open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None):
        return oldopen(str(file), mode, buffering, encoding, errors, newline, closefd, opener)

I should note that this is just off the top of my head, and there may be a better way to handle this backwards dependency. Also, this doesn't help in the case of using a dictionary as the input. I'm not sure there's a good way to handle that, though.

Since this PR is related to time histories, I have a somewhat related question for you that I just stumbled on. Let's say somebody has a csv file with three columns - time, pressure, and volume. Right now, these must be implemented as two separate time-histories and the csv will be loaded twice, right? Is there interest in allowing the user to specify multiple time-histories using a single file?

The only problem I see with loading twice is that it might take some time to load the file from disk. I think it makes more sense to keep the specifications of the time histories separate, and try to cache the data in the csv file somehow, rather than loading it from disk twice.

jsantner commented 6 years ago

I've never used Path objects before, that's an interesting idea. How would the Path object be sent from ChemKED to DataPoint where it's used to read the csv file, though? It still must be passed as an argument, right? I can add that functionality in if you think it's better than passing the directory.

Using a dictionary input, I think the filename would have to be specified as an absolute path. If it were a relative path, what would it be relative to? There's no yaml file.

bryanwweber commented 6 years ago

What if we do the path munging in the loop that creates the DataPoints? Something like

for point in self._properties['datapoints']:
    if 'time-histories' in point:
        for th in point['time-histories']:
            try:
                filename = Path(th['values']['filename'])
            except TypeError:
                pass
            else:
                if yaml_file is not None:
                    th['values']['filename'] = (yaml_file.stem/filename).resolve()
                else:
                    th['values']['filename'] = filename.resolve()
     self.datapoints.append(DataPoint(point))

(please correct any indentation errors, writing code with proportional fonts is hard...) Then we don't have to pass anything around. This will resolve the path into an absolute path relative to the yaml file (if given) or relative to the CWD of the Python process, which should handle the dictionary and the yaml file cases gracefully.

BTW, one of the reasons I'm pushing back here is that I don't want to change how DataPoint is called, if at all possible.

Can you please make sure to add tests for all these code branches? The diff coverage should be 100%, you can see the lines that haven't been run here: https://codecov.io/gh/pr-omethe-us/PyKED/pull/115/diff where the lines that are in the brighter red in the left column weren't executed during testing.

jsantner commented 6 years ago

That's a smart way to do it, I'll add it in and test it.
I wasn't sure how to make a test for an error message (line 256 in validation.py), that's why the coverage isn't 100%. Can you point me to an example in the tests where an error message is tested?

bryanwweber commented 6 years ago

https://github.com/pr-omethe-us/PyKED/blob/master/pyked/tests/test_chemked.py#L59

By the way, you'll also have to turn the yaml_file into a Path when that gets processed in the __init__ method.

if yaml_file is not None:
    yaml_file = Path(yaml_file)
    with open...