Handle sub-paths when matching

jacebrowning commented 7 years ago

@astronouth7303 Here's a test that fails in the matching logic. Any thoughts on how to handle this?

AstraLuma commented 7 years ago

The simplest way would be to make use of ** recursive globs (See the glob() docs for details.) But this is only available in Python 3.5+, and the docs say:

Note: Using the “**” pattern in large directory trees may consume an inordinate amount of time.

AstraLuma commented 7 years ago

The two changes to be made to implement this:

Update GlobFormatter.format_field() to return '**'
Update match() to call glob.iglob() with recursive=True

PSF is no longer releasing 3.4, and the distros shipping 3.4 are Debian oldstable (Jessie) and Ubuntu 14.4 (Trusty). I'm not sure what your personal feelings are on old Pythons.

jacebrowning commented 7 years ago

I'm OK with requiring Python 3.5+.

AstraLuma commented 7 years ago

~So, Umm... I tried this, and it's doing something weird. Investigating.~

AstraLuma commented 7 years ago

There's a problem. The regex produced by the parse module (which I pulled in to reverse file paths based on the path_format) produces the regex data/(?P<self_kind>.+?)/(?P<self_key>.+?)\.yml, which is ambiguous.

Basically, we don't have knowledge about which keys could contain delimiters used, so it's difficult to extract fields from the path.

jacebrowning commented 7 years ago

Yeah, obviously only one of the attributes can be allowed to contain arbitrary paths.

Could there be some way to indicate with attribute can be expanded? Or assume it's the first attribute?

AstraLuma commented 7 years ago

parse assumes non-greedy everything, so all the delimiters will end up in the last field.

The complete match() to data sequence is:

match() attempts to synthesize constructor arguments based on comparing the path_format to the found file plus the given keyword arguments
The user's factory and __init__() are called
sync_instances() generates a filename from the attributes
sync_object() does it's thing and loads data

There's a few problems with this:

Because of the ambiguity of path_format, there's multiple ways to parse a path, a subset of which will match the given kwargs (it currently doesn't even try to find this subset)
There are code paths in sync_object() that will completely fubar data if match() doesn't get the arguments exactly right. However, if match() selects a set of arguments that produce the same filename, that codepath will be skipped (the mapper.missing path) and the mapper will .load() the data (which I think will correct any errors made by match()).

So that's the long-way around the conclusion that I think we just need a version of the parser (what parse is doing now) that can handle this ambiguity and select a version that will load. Hopefully. (Step 2 above kinda leaves a whole lot of possibility for poor selection of arguments to hose everything.)

AstraLuma commented 7 years ago

The other alternative is to refactor a bunch of things to support either:

Pre-loading the file name so that only the user's code is affected by poor selection of arguments, but the mapper will still pick the right file
Reading the yaml before object construction and using that data to feed the factory, and then the mapper uses that data

Neither of these are options I'm really comfortable with handling.

jacebrowning commented 5 years ago

I'm going to close this. My focus is now on the spiritual successor to this project: https://github.com/jacebrowning/datafiles

With https://github.com/jacebrowning/datafiles/issues/5, I plan to add similar functionality.

jacebrowning / yorm

Handle sub-paths when matching #147