jacebrowning / yorm

Automatic object-YAML mapping for Python.
https://yorm.readthedocs.io
MIT License
27 stars 6 forks source link

Handle sub-paths when matching #147

Closed jacebrowning closed 5 years ago

jacebrowning commented 7 years ago

@astronouth7303 Here's a test that fails in the matching logic. Any thoughts on how to handle this?

AstraLuma commented 7 years ago

The simplest way would be to make use of ** recursive globs (See the glob() docs for details.) But this is only available in Python 3.5+, and the docs say:

Note: Using the “**” pattern in large directory trees may consume an inordinate amount of time.

AstraLuma commented 7 years ago

The two changes to be made to implement this:

  1. Update GlobFormatter.format_field() to return '**'
  2. Update match() to call glob.iglob() with recursive=True

PSF is no longer releasing 3.4, and the distros shipping 3.4 are Debian oldstable (Jessie) and Ubuntu 14.4 (Trusty). I'm not sure what your personal feelings are on old Pythons.

jacebrowning commented 7 years ago

I'm OK with requiring Python 3.5+.

AstraLuma commented 7 years ago

~So, Umm... I tried this, and it's doing something weird. Investigating.~

AstraLuma commented 7 years ago

There's a problem. The regex produced by the parse module (which I pulled in to reverse file paths based on the path_format) produces the regex data/(?P<self_kind>.+?)/(?P<self_key>.+?)\.yml, which is ambiguous.

Basically, we don't have knowledge about which keys could contain delimiters used, so it's difficult to extract fields from the path.

jacebrowning commented 7 years ago

Yeah, obviously only one of the attributes can be allowed to contain arbitrary paths.

Could there be some way to indicate with attribute can be expanded? Or assume it's the first attribute?

AstraLuma commented 7 years ago

parse assumes non-greedy everything, so all the delimiters will end up in the last field.

The complete match() to data sequence is:

  1. match() attempts to synthesize constructor arguments based on comparing the path_format to the found file plus the given keyword arguments
  2. The user's factory and __init__() are called
  3. sync_instances() generates a filename from the attributes
  4. sync_object() does it's thing and loads data

There's a few problems with this:

So that's the long-way around the conclusion that I think we just need a version of the parser (what parse is doing now) that can handle this ambiguity and select a version that will load. Hopefully. (Step 2 above kinda leaves a whole lot of possibility for poor selection of arguments to hose everything.)

AstraLuma commented 7 years ago

The other alternative is to refactor a bunch of things to support either:

Neither of these are options I'm really comfortable with handling.

jacebrowning commented 5 years ago

I'm going to close this. My focus is now on the spiritual successor to this project: https://github.com/jacebrowning/datafiles

With https://github.com/jacebrowning/datafiles/issues/5, I plan to add similar functionality.