molgenis / molgenis

MOLGENIS - for scientific data: management, exploration, integration and analysis.
https://molgenis.org
GNU Lesser General Public License v3.0
111 stars 99 forks source link

Version controling metadata in Molgenis #7566

Open cpavanrun opened 6 years ago

cpavanrun commented 6 years ago

As a dev team we are working on deploying MolGenis. This includes updating the metadata of a running instance; entities, attributes, etc.

Currently we maintain comma-separated value files with metadata within a git repository (i.e. attributes.csv, entities.csv, etc).

However, we've run into problems with version controlling this aspect of our MolGenis instance. The most obvious update route is to provide the EMX importer plugin with the latest version of metadata.

Expected behavior

MolGenis updates the related entites with the new metadata.

Observed behavior

We often run into the following problem: You are trying to upload entities that are not compatible with the already existing entities [..]

Workarounds

We do either of the following:

  1. Download data. Reset the instance. Upload new metadata version. Upload data (pray to a diety).
  2. Manually update the .csv's and update the metadata via the GUI (metadata manger) on the Molgenis instance.

The first one is iffy on production systems and has a high pray-that-nothing-gets-lost vibe.

The second one is bound to have diverging metadata between the actual instance and what we (remember) to add in git.

Our question: is there a better way?

fdlk commented 6 years ago

Hi @cpavanrun, thanks for reporting! I think you're running into a bug where the compatibility check of the emx importer is being way too simplistic. (See https://github.com/molgenis/molgenis/blob/master/molgenis-data/src/main/java/org/molgenis/data/meta/MetaDataServiceImpl.java#L557 )

fdlk commented 6 years ago

I do have a third workaround that I can think of: Do the metadata editing in molgenis using the metadata editor plugin and use the EMX download tool to download the changes in emx format.

cpavanrun commented 6 years ago

Hi @fdlk, thanks for your reply!

Your suggested third workaround is an improvement of the second workaround but seems backwards too me. That is, the flow is from the instance to the version controlling. Not withstanding the method requires you to unzip the download, throw out the not relevant package metadata (i.e. sys) and integrate it into the version control.

A fourth workaround I could come up with is building a custom updater that tries to update the relevant system entities via the API (i.e. sys_md_Attribute). Doesn't feel right, touching the system entities in such a way.

Fixing the simplistic checks of the EMX updater is something that we would really would like to have to keep things tidy and future proof. Is there anything we can do to help fix it or push it onto the roadmap for the 7 release?

(I forgot to mentiond but you are correctly assuming we're running version 6.1 =) )

LuukDijkhuis commented 6 years ago

Thanks for your input. In fact one would like to be able to express metadata changes as migrations, which we are looking to design and implement one day. Given the current huge backlog of todos that may, unfortunately, take a while.

cpavanrun commented 6 years ago

Thanks for the info @LuukDijkhuis . We painfully understand the backlog problem. We'll keep taps on the changelog and try and keep things clear.