moj-analytical-services / etl_manager

A python package to create a database on the platform using our moj data warehousing framework
21 stars 8 forks source link

Add module for extracting metadata #131

Closed MrAlecJohnson closed 3 years ago

MrAlecJohnson commented 3 years ago

The extract_metadata module is designed to extract metadata from Delius and turn it into json files in our shared metadata format. This should let you produce a metadata folder that you can pass to etl_manager's read_database_folder() function.

It includes:

MrAlecJohnson commented 3 years ago

A few minor changes / questions. A couple of other things:

  • Some files need a newline at the end
  • Think I may have raised this question before. This is a package that is meant to be agnostic to what system it is being used for. Where as these changes are very specific to one system and might not be appropriate for etl_manager or should be more generalised than its current state?

@isichei Fab, thank you. Fixed those newlines in the json files, and made changes in response to those comments. Have now repushed with the changes.

On the generalisation point, it felt like it was most useful to start with solving the problem we have now. Then we could make it more generalised as new situations come up (in other words, as we add new databases). Otherwise it's spending time making something we don't yet need. And though we know we will need something else in the future, we don't yet know exactly what that will be.

Despite that, I still think it's worth keeping it in this repo. That way it can be a starting point that can be made more generalised as we use it in more situations. If we put it in a database-specific repo, we run the risk of repeating work in lots of places rather than trying to reuse something central.

MrAlecJohnson commented 3 years ago

Closing and withdrawing this pull request, as discussed here and offline. Look out for an exciting new repo, coming soon!