moj-analytical-services / etl_manager

A python package to create a database on the platform using our moj data warehousing framework
21 stars 8 forks source link

string provided must be lowercase #106

Open samtazzyman opened 4 years ago

samtazzyman commented 4 years ago

I've got json files taken from the github API as my data. I want to define a scheme to test the raw data against on its arrival. When I try to follow the steps in https://github.com/moj-analytical-services/etl_pipeline_example it doesn't really tell me how to create the schema in the first place.

When I try to use DatabaseMeta and TableMeta to define the schema I get

tab.add_column(
    name='authoredByCommitter',
    type='boolean',
    description='whether the author and committer are the same'
)

I get ValueError: string provided must be lowercase

The issue being that by data comes out of Github's API with the field names in camel case. Obviously I could work through and rename them, but ought I to have to do this? If the data was in csv I could probably drop all of the headers and redefine them (?) but for a json with its nested structure this is potentially even more of a pain in the arse.

isichei commented 4 years ago

Yeah this was a self imposed rule but finding the same issue with dashes in json names. It is likely that we will start pulling more jsons that also have names that are web specific and more likely to have - than _ which will cause the same issue. I think we should loosen the rules.

RobinL commented 4 years ago

+1 just encountered it with an s3 path that had capitals in it