nansencenter / metanorm

Metadata normalizing tool
GNU General Public License v3.0
0 stars 1 forks source link

Simplification of metanorm #81

Closed aperrin66 closed 3 years ago

aperrin66 commented 3 years ago

Metanorm was originally designed this way: each normalizer takes care of one metadata convention, then passes responsibility for the attributes it could not fill to the next normalizer.

Looking at the state of the code now, it appears that this is only applicable in some rare cases. The metadata conventions are followed so loosely and vary so much from one metadata provider to the next that most normalizers end up being specific to a provider.

This results in weird and/or inefficient code which must reconcile real world cases with the original design of metanorm.

We could probably make the code both more simple and efficient by having a structure like this:

UPDATE: The base structure is in place, and I migrated the Creodias normalizer to have a simple example.

Here are the remaining normalizers to migrate/create (hopefully I did not forget any):

For each of these, please create a branch from issue81_simplification_refactoring, do the modifications, and open a pull request with issue81_simplification_refactoring as target branch.

The new normalizers will be put in the metanorm/normalizers/geospaas/ folder.

The new Creodias normalizer can be taken as example.

Once all the normalizers have been migrated, we can remove the old base classes and move on to adapt geospaas_harvesting.

aperrin66 commented 3 years ago

If an existing normalizer does not have a particular get_...() method, remember to check url.py. Some hard-coded values are defined in there even for providers which have their own normalizer.

aperrin66 commented 3 years ago

There is some repetition in the various normalizers, but I would like to wait for all normalizers to be migrated to the new format before factorizing.