mattbernst / polyhartree

Tools to automate routine computational chemistry
GNU General Public License v3.0
4 stars 2 forks source link

Recast data extraction as pattern matching between sentinel values #7

Open mattbernst opened 9 years ago

mattbernst commented 9 years ago

Investigate this: can we further simplify data extraction by having a single extract_matches method that takes parameters line_scanner, start=None, end=None? Do it as a generator.

When start and end are None, line_scanner would try to match patterns from every line like the line_to_geometry method does inside extract_geometry. When start is None and end is a string, the scanner would run until it encounters the end-string or until it runs out of data. When start is a string and end is None, the scanner would only try to match lines after it encounters the start-string in input. When start and end are both strings, the scanner would enter active mode only between start and end pairs, to e.g. extract blocks of data.

A higher level generator on top of this generator can manage transitions to group data in blocks. This simplifies structure extraction for geometry.

mattbernst commented 9 years ago

Didn't do it quite this way, but in any case made geometry extraction code a lot more reusable across adapters. May add more generality when it comes time for extracting additional data.