Open kurtbrose opened 6 years ago
This idea has evolved a bit -- call it PathEnumerate now, and its job is to dissect a target out into a list of (T, object) pairs.
e.g.
glom([ {'hello': 'world'}, {'goodbye': 'world'}], PathEnumerate())
would result in
[ (T[0], {'hello': 'world'}),
(T[0]['hello'], 'world'),
(T[1], {'goodby': 'world'}),
(T[1]['goodbye'], world)
]
again, the goal is to make glom-specs that mutate glom-specs possible by allowing reasonable specs that operate on an arbitrarily nested structure
I needed this feature for my usecase of GDPR. Anywhere I find the key email
I gotta remove it -- and it could moving around and hiding!
paths = catalog(target)
# filter paths with regex like '.*email'
for path in paths:
glom.assign(target, path, None)
Hey @roryhr! This feature is still coming to glom, but in the meantime you can do what Kurt and I do and use an earlier design, called remap: http://sedimental.org/remap.html#drop-empty-values
It's a bit trickier to use, but it's perfect for cases like yours (similar to the one linked above). Hope this helps!
https://www.w3schools.com/xml/xpath_syntax.asp
traverse should also be able to do an XPath like syntax to filter output (or, if not traverse, something that can be used with traverse very easily)
if the output of traverse is [(path, element)]
, then the output could be filtered with Match(path)
-- however, wildcard is a bit trickier
in XPath, .
is "current node" and *
is "any number of nodes" -- I'd propose switching these to *
and ...
for glom, since I think these are more familiar to glom's audience from file system globbing and use of ...
in python []
syntax
another thing that XPath syntax makes a great deal of is "attributes" vs "path"
here's a good acid test for capability I think:
one way this could be expressed is
('0.bookstore.book', And(('price', M > 35), 'title'))
a bit more of a mouthful than
/bookstore/book[price>35.00]/title
come to think of it... maybe there's something here we want kind of a multi-fetch rather than a pure traverse
what if path supported a '*' syntax which switched it from returning a single result to an iterable of results?
outside of XML land, every node doesn't implicitly have multiple children that you can only refer to by type...
what if this
Path('bookstore.books.*')
was a short-hand for an iterable of results
('bookstore.books.*', [And(('price', M > 35), 'title') | SKIP])
maybe something like that?
then, '...'
path segment would trigger a recursive walk
Path('a...b') # return all 'b's at any level from 'a'
one challenge here is that now the path is unknown if e.g. you want to emit that; we could cover that by making S[Path]
contain the actual path
then, the "plain" Traverse
above would translate to Path('...')
I guess "get all paths and values" would be ['...', Fill( (S[Path], T) )]
another helper that would be super useful in case of e.g. the GDPR email thing would be Replace()
-- assuming we get the invariant on S[Path]
right, this would be equivalent to
Assign(Path(S[PARENT][T]) + S[Path], newval)
or something like that -- on parent of current target, replace current target with new value
I guess the problem with leaning on S[Path]
here is that it makes the resulting spec extremely context sensitive
maybe if there was a way to back out instead?
Path('...email..')
this would express, find any paths that go through an attribute named "email", then "back up" one level to the parent
(Path('...email..'), Assign('email', newval)
this would be, go to everywhere with email, then replace with newval
...if we allowed a mechanism for embedding regex...
(Path('...{.*email}..'), Assign(S[Path][-1], newval)
so I really like that syntax as a top-level; but probably also want to make sure it decomposes into nice bits and Path
doesn't just become super complicated and magical
per discussion:
*
and **
are probably better than *
and ...
(avoids colliding with .
path demarcation)
some related:
https://github.com/mahmoud/glom/issues/89 -- solved by **
https://github.com/mahmoud/glom/issues/40 -- similar to GDPR use case above
https://github.com/mahmoud/glom/issues/39 -- not sure if this would address that, but there's a similar solution proposed of walk-with-path
what would the stand-alone names for *
and **
be? Glob()
and RGlob()
(recursive-glob)?
Traverse()
and Reverse()
(recursive traverse)
maybe Tread()
and Retread()
?
Iter()
and DeepIter()
?
Every()
and REvery()
? All()
and RAll()
? Each()
and Reach()
?
I kind of like Each()
and Reach()
Hello. Is the plan still to implement traverse at some point? Any helpt required for this?
the job of a
Traverse
is to walk its target recursively and return an iterator over all of the bits (as in depth-first or breadth-first traversal) -- this could perhaps share some bits with TargetRegistrythis is very useful when combined with
Check
andAssign
for a kind of pattern-matching strategy:if there was an un-traverse glom possible, that would be even more powerful; but in the absence of that being able to do something to the items being traversed is still useful
the ultimate goal of this kind of approach is a useful meta-glom -- you can imagine transformations like "set all defaults to a unique marker object that stores the path" to debug why an output is coming as None
the ultimate, ultimate goal being useful glom-macros (
glomacro
?) and glom-compilation (glompilation
?)