mahmoud / glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
https://glom.readthedocs.io
Other
1.89k stars 61 forks source link

SKIP all PathAccessErrors Recursively #228

Closed satish1180 closed 2 years ago

satish1180 commented 2 years ago

How to skip all path access errors ?

input= {
       "firstname": "satish",
       "lastname": "reddy"     #may or may not be present
       "details" : {
                   "phoneno": "987654321",
                    "address": "xxxxx",   # may or may not be present
                    "pincode": "xxxxx"   # may or may not be present
       },
       "familydetails": [
              {
                   "name": "PersonA",
                   "address": "Adress-A",
                   "phoneno": "999999999"
               },
              {
                   "name": "PersonB",
                   "phoneno": "999999999"
               },
              {
                   "name": "PersonC"
               }
        ]
}
output_needed = {
            "Captain": {
                     "FirstName": "satish",
                     "LastName": "reddy",   # include only if it is present in input
                     "PersonalDetails": {
                                         "MobileNo": "987654321",
                                         "Address": "xxxxx",   # include only if it is present in input
                                         "Pincode": "xxxxx"   # include only if it is present in input
                      } ,
                     "CaptainFamilyDetails": [
                                    {
                                         "Name": "PersonA", 
                                         "Address": "Adress-A",   # include only if it is present in input
                                         "MobileNo": "999999999" # include only if it is present in input
                                     },
                                    {
                                         "Name": "PersonB",
                                         "MobileNo": "999999999"
                                     },
                                    {
                                         "Name": "PersonC"
                                     }
                    ]
                }
}
spec = {
  "Captain": {
       "FirstName": "firstname",
       "LastName": "lastname",
      "PersonalDetails": {
                  "MobileNo": "details.phoneno",
                   "Address":  "details.address",
                   "Pincode": "details.pincode"
              },
       "CaptainFamilyDetails" : ("familydetails", [
                  {
                     "Name": "name",
                     "MobileNo": "phoneno",
                     "Address": "address"
                  }
             ])
}

}

This spec works only when all required paramaters are provided, but if some are missing getting a PathAccessError.

I tried Coalesce with optional fields, but it will not be viable for me, i have 25-30 more parameters to map, we exactly have no knowledge on which paramters are optional.

So How to skip all Path access error recursively ?

satish1180 commented 2 years ago

@kurtbrose can you help me with this?

kurtbrose commented 2 years ago

If I understand correctly, you want to extract values from a path in the input, but in case the input is missing, you want to skip it?

from glom import glom, Or, SKIP, Val
input = {'present': {'a': {'b': 1}}, 'missing1': {'a': {}}, 'missing2': {}}
or_skip = lambda spec: Or(spec, Val(SKIP))
spec = {'P': or_skip('present.a.b'), 'M1': or_skip('missing1.a.b'), 'M2': or_skip('missing2.a.b'), 'M3': or_skip('missing3.a.b')}
glom(input, spec)
# {'P': 1}

you can use that or_skip() idiom for any level of complexity of nested glom, as long as it is embedded in a dict or list

having to do Val(SKIP) is a bit annoying, I'm got some ideas on how to clean that up :-)

We can also use a meta-glom so you don't have to junk up the spec with a bunch of or_skip

spec = {
  "Captain": {
       "FirstName": "firstname",
       "LastName": "lastname",
      "PersonalDetails": {
                  "MobileNo": "details.phoneno",
                   "Address":  "details.address",
                   "Pincode": "details.pincode"
              },
       "CaptainFamilyDetails" : ("familydetails", [
                  {
                     "Name": "name",
                     "MobileNo": "phoneno",
                     "Address": "address"
                  }
             ])
   }
}

meta_spec = Ref('spec', 
   Match(Switch({
       dict: {T: Pipe(Ref('spec'), Auto(or_skip))},  # add / remove or_skip here if you want to drop dict keys that fail
       list: [Pipe(Ref('spec'), Auto(or_skip))],  # add / remove or_skip here if you want to drop list items that fail
       tuple: Auto(([Ref('spec')], tuple)),
       object: T,
   }))
)

compiled_spec = glom(spec, meta_spec)

the compiled spec comes out something like this:

>>> glom(spec, meta_spec)
{'Captain': Or({'CaptainFamilyDetails': Or(('familydetails', [Or({'Address': Or('address', Val(Sentinel('SKIP'))), 'MobileNo': Or('phoneno', Val(Sentinel('SKIP'))), 'Name': Or('name', Val(Sentinel('SKIP')))}, Val(Sentinel('SKIP')))]), Val(Sentinel('SKIP'))), 'FirstName': Or('firstname', Val(Sentinel('SKIP'))), 'LastName': Or('lastname', Val(Sentinel('SKIP'))), 'PersonalDetails': Or({'Address': Or('details.address', Val(Sentinel('SKIP'))), 'MobileNo': Or('details.phoneno', Val(Sentinel('SKIP'))), 'Pincode': Or('details.pincode', Val(Sentinel('SKIP')))}, Val(Sentinel('SKIP')))}, Val(Sentinel('SKIP')))}

that way you can keep your main spec clean :-)

kurtbrose commented 2 years ago

here's what the compiled spec does:

>>> glom({}, compiled_spec)
{'Captain': {'PersonalDetails': {}}}

if you only want to have certain fields be skipped if missing, then probably better to explicitly mark them with or_skip() rather that this meta-spec approach

satish1180 commented 2 years ago

Thankyou @kurtbrose , I will try meta_spec approach , if it works I will close the issue.

satish1180 commented 2 years ago

@kurtbrose can you explain Auto(([Ref('spec')], tuple)) this please ?

kurtbrose commented 2 years ago

Sure! Maybe I can get a new snippet out of this :-)

To unpack:

Auto -- this says switch mode(https://glom.readthedocs.io/en/latest/modes.html) back to default from Match

[...] -- we know T is a tuple here, so we want to iterate over each element of the tuple

Ref(spec) -- recurse downwards into the tuple

( ..., tuple) -- [] by default will return a list, convert it back to a tuple so that tuple in = tuple out

The reason I added this is so that ("familydetails", [ will work properly -- the recursion can "pass through" the tuple and get to the dict inside that tuple.

satish1180 commented 2 years ago

tq so much!!!