qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
117 stars 37 forks source link

Overhaul Metadata Management In QGIS #91

Open timlinux opened 7 years ago

timlinux commented 7 years ago

QGIS Enhancement: Overhaul Metadata Management In QGIS

Date 2017/02/26

Author Tim Sutton (@timlinux)

Contact tim@kartoza.com

maintainer @timlinux

Version QGIS 3.0

Summary

Preceding work

QGIS Metadata Strategy: https://gist.github.com/tomkralidis/33f781e361f6d855c2f4

Keep in mind the existing QEPs: https://github.com/qgis/QGIS-Enhancement-Proposals/issues/33 https://github.com/qgis/QGIS-Enhancement-Proposals/issues/50

And keep in mind existing metadata editors:

Our intent is to build on these previous works and ideas by building a number of components that provide a comprehensive metadata strategy for QGIS.

Proposed Solution

The big picture of what we plan to produce is here:

image07

We propose to first implement these components (WP = Work Package):

We will make a separate QEP for the other work packages once these are the above ones are taken care of. More details on these Work packages can be found below.

Work package 1 - Schema Definition/Selection:

Input schema selection: In this phase we will identify input schemas to be used for validation. We propose initially to support Dublin Core and validate against the following CSW Record schemas:

Although to keep it simple, we start by supporting Dublin Core, we expect that the future evolution will be towards supporting ISO.

Internal Schema: In this phase we will specify a schema for internal representation of metadata within QGIS (‘the QGIS Metadata Schema’). This schema would be independent of any existing standards and would be the basic structure in which all incoming metadata would be stored. When we add support for additional formats in the future, the expectation would be that these formats are also transitioned to the QGIS internal format on import so that we can deal with a single common metadata structure internally.

Since the QGIS internal schema most likely won’t be a superset of all existing schemas, conversions between this and any other schemas may result in a loss information, which mean we won’t support metadata “round trips”. One proposed solution to loss of round tripping is to keep the original metadata document (if provided) and then interpolate new values into it if it is updated.

We will also identify which fields should be mandatory within the QGIS Metadata Schema. These should include mostly information which we can extract automatically from the dataset, without requiring any intervention from the user. Only in this way, we can guarantee the automatic generation of internal metadata for every dataset.

Other things to mention:

Status: A proposed schema has been written here qgis/qgis/#4330

Work package 2 - QGIS Metadata API:

In this work package we will build the basic C++ framework for parsing metadata from a schema - initially Dublin Core and QGIS Metadata Schema. This includes implementing an internal model for representing metadata, based on the metadata schema created on WP1.

Additional deliverables:

Work package 3 - Implement QGIS Metadata Storage support

In this WP we will introduce an external physical format for storing metadata internally, the “metadata store”. The goal is to support portability, enabling users to share their metadata, even in offline scenarios. This WP will build directly on the outputs of WP1, which will define an "internal metadata schema" and WP2, “QGIS metadata API”, which will encode/decode from the internal schema to the supported schemas (right now, only Dublin core).

screen shot 2017-04-04 at 11 57 36 am

QGIS will support two types of metadata stores: stores and local. In this WP we will focus on local stores, only. In the diagram below we depict the inheritance model for metadata stores, where an abstract metadata store will have a polymorphic behavior, according to the particular data format. For instance in the case of a PostgreSQL DB, the method “save” will create a table on the database, whether in the case of a Shapefile, it would create an XML file.

screen shot 2017-04-04 at 11 57 46 am

Some formats, such as text files, can be more limited than others. For that reason, we will create a “prime” format, the “QGIS metadata store”, which can accompany more restrictive formats.The prime format will be an SQLite database, because of its lightweight, and because it is well-known within the QGIS community.

As the goal is to support all these different formats in the future, we will design an infrastructure to accommodate that, but in this first iteration we will focus on the simple use case of creating an xml file, and an SQLite data store. The metadata contents will be passed by the metadata API. In this WP we will implement format translation, but not schema translation.

We will implement a user interface to allow the user to configure serialization/deserialization behavior, e.g.: in which format we should write metadata, and where. In WP5, we will add metadata detection (which perhaps we can turn on and off in the project settings). For instance, if there is an xml file with the same name and path as a Shapefile, QGIS would attempt to automatically import metadata.

The QGIS metadata store will be synced with any changes that we apply to the metadata. In the moment that we export metadata into XML, it will write those changes to the XML file.

Metadata search will also be polymorphic, according to the data format. In this iteration we will implement some text search for SQLite, and will use that rather than searching in text files which tends to be slower.

Activities:

Deliverables:

Dependencies: WP1, WP2

Work package 4 - Implement QGIS metadata viewer:

Metadata is only useful if it is visible to the users of the dataset that the metadata is associated with. For this reason we should have provision for presenting the metadata in an eye-pleasing and informative manner and with minimal work required on behalf of the user. We also aim to implement this, earlier on in the project workflow, so that we can start outputting the data stored in QgsMetadata.

Some thoughts:

The ideas is to replace this:

image00

With something like this (taken from GeoNode):

image12

Work package 8 - Implement QGIS metadata editor for layers

In wizard mode:

screencapture

In form mode:

screencapture

Example(s)

(optional)

Affected Files

(required if applicable)

Performance Implications

(required if known at design time)

Further Considerations/Improvements

We have some funding to make these work packages happen (for around 80%) - if anyone is interested in co funding the shortfall, please let us know.

There is a discussion group at: https://gitter.im/qgis/metadata for those who wish to collaborate in making QGIS metadata better.

The following people have already joined the effort and will be doing implementation work, planning, offering advice etc.

Backwards Compatibility

This will be new code and will replace any existing metadata implementation work (including what is currently in layer properties dialog). We will try to make sure that server and other parts that rely on metadata do not break - we would welcome support and input from those working on QGIS Server.

Issue Tracking ID(s)

(optional)

Votes

(required)

samperd commented 7 years ago

@timlinux I was just notified of this thread. Recently I have been thinking of MD management within QGIS as well. What follows is a brain dump I shared via e-mail. So I release these ideas and thoughts to the wild as is.

Problem statement: although there are great tools for working with geospatial data, techs still spend a huge amount of time searching, collecting, storing and managing data.

Solution: A toolkit to facilitate a common work flow or tool set for GIS techs. Thinking of a Git Flow model for GIS.

User story: As a qgis user I want a tool or catalogue to keep track of all the data I download or create throughout all my projects. So that I can better manage my local data inventory, keep track of data sources, their age and any MD I create or import.

Audience: the solo data wrangler managing data and resources on the desktop. Not a large team environment but maybe a small team accessing the same data?

Develop a "catalogue for everyone". This is not an enterprise catalogue but a personal one.

It will use PYCSW as the back end. A simple qgis interface to create and manage MD. You can add local data to the catalogue or import one or more MD records from any CSW resource added to projects. Metasearch will be the search interface.

The Metadata editing interface will be created dynamically from the schema. Or perhaps a generic model with XSLT to transform into other schemas?

Records can be exported or pushed to other catalogues through CSW.

It will use OGR and GDAL tools to populate the known data .

ISO 19115 as the profile (HNAP?).

Manage both geo and non geo data. Tabular data and images.

Did I just describe metatools by nextGIS ( inactive for a year or more) or meta edit (dead since 2011)? They both seem to have parts. Next GIS was the company involved in metasearch at one point I think.

I also wonder if it should come with a template for directory structures to store and access local data. Similar to how GRASS works but not using that data format.

A tool that can manage downloaded data. For instance monitoring an FTP or HTTP directory. When data is updated it can either notify the user or download and update local dataset.

A tool to create data management plans.

A tool to create data quality reports and data dictionaries

just some random thoughts.

timlinux commented 7 years ago

Hi @samperd. Thanks very much for your brain dump. You will be pleased to know that a lot of your ideas are already incorporated into our thinking / planning. Take a look at our google doc for more details. I only included the 4 work packages above in this QEP because I don't want to muddy the waters by tabling too many features and sub proposals all at the same time. If metadata is something you care about, I encourage you to join our little subgroup in the gitter channel mentioned above.

timlinux commented 7 years ago

Update: We have updated the QEP to include Work Package 3 since on reflection we believe it needs to be implemented in the first round of development.

CC: @doublebyte1 @tomkralidis @nyalldawson @gustry @kalxas

archaeogeek commented 7 years ago

Sorry for the late response to this but I have only just been made aware of this document. Is this the best way to make comments or would you prefer I did that somewhere else?

One thing that immediately springs to mind is that the definition of a service is a bit fuzzy. Here you define it as a qgis project, but for INSPIRE etc service means a view or download service, eg a WMS/WFS server. I can see a scenario where a user might want to store WMS/WFS service metadata, and metadata about the layers those services provide, and metadata about the project. There's also the concept of how these things link together- eg WMS/WFS services have related child datasets, but then the relation between the datasets and some parent qgis project would also need to be considered.

timlinux commented 7 years ago

Note: Updated work package 1 to include a reference to the pull request for the proposed schema:

Status: A proposed schema has been written here qgis/qgis/#4330

timlinux commented 7 years ago

@archaeogeek comments here are welcome, we also have a live chat channel on gitter at https://gitter.im/qgis/metadata

Currently we have only two concepts:

It would be great to get your inputs about what design changes you would envisage. At the schema level this comments are probably best directed to the PR at qgis/qgis/#4330. At the higher level, it would be great to hear your ideas on practically when where and how the concepts you propose would be captured and displayed within the QGIS desktop application.

Bear in mind we have further work packages planned - this QEP covers only the first pass implementation addressing layer level metadata. Perhaps take a look at our scratch document to see what other things we plan for the short term.

mhugo commented 7 years ago

Hi, about the "QGIS Metadata storage", could it be implemented with our proposed "auxiliary storage" (https://github.com/qgis/QGIS-Enhancement-Proposals/issues/27 - which we will hopefully start soon) ? Or do you need something more / different ?

doublebyte1 commented 7 years ago

Hi @mhugo, I think the objective here is slightly different, since AFAIU you want to store auxiliary data for a layer (row based) and we want to store the schema described in WP1 (layer based). We will use the same backend (SpatiaLite) for now, but the idea is to support a polymorphic storage. You can read more details in this blog post: Beyond the scope of this enhancement, I really like your idea of changing the project storage format. In this plugin, we though of actually packaging the whole project as a spatialite database.

mj10777 commented 7 years ago

What thoughts are being made to add a 'Copyright' field

to the Layers Panel Description area?

This is often a requirement for the use of OpenData sources, such as:

https://www.ordnancesurvey.co.uk/opendata/licensing.html Acknowledge the copyright and the source of the data by including the following attribution statement: ‘Contains Ordnance Survey data © Crown copyright and database right 2013’.

When geo-referencing an image from such a source, I add a

which is then taken over when geo-referencing the image with gdal, being one of the tags supported by Geo-Tiffs:

http://www.gdal.org/frmt_gtiff.html


For the RasterLite2 project, Alessandro Furieri (Sandro) and I discussed this matter and the final decision was to add a copyright and licence fields, together with the existing title and abstract fields for both

to the metadata tables.

The next spatialite version will create a data_licenses table and fill it with a list of common licenses for all new Databases.

The idea being, in this way, to avoid any legal hassle by offering a means to store and for views to display the information as needed or required.

So when a Geo-Tiff is being imported, when the following TIFFTAGs are found:

they will be taken over.


After reading this and the other concepts, I was surprised to see that this aspect was not included. It seems that even the 'Dublin Core' and 'ISO 19115' also do not deal with this

In the case of 'Dublin Core', it seems to even have been removed: http://dublincore.org/workshops/dc1/report.shtml As a result of the restricted focus of the workshop, certain issues required for a complete description of DLOs, such as cost, archival status and copyright information, were eliminated from the scope of the discussion.

But for QGIS, I would say, it would be wise to also to avoid any legal hassle by adding a copyright field to to Layers Panel Description area and the metadata concept.

timlinux commented 7 years ago

@tomkralidis @kalxas @archaeogeek What do you guys think of @mj10777's proposal? I'm OK to add it, or at least something that indicates ownership and usage terms of the data.

archaeogeek commented 7 years ago

I'm in full agreement. Dublin Core is pretty limited, so adding something for ownership/access constraints would be helpful. I wouldn't describe it as copyright though, as that's just one form of access constraint.

nyalldawson commented 7 years ago

@tomkralidis @kalxas @archaeogeek What do you guys think of @mj10777's proposal? I'm OK to add it, or at least something that indicates ownership and usage terms of the data.

I'm +1, but would prefer "attribution" over "copyright". Many geospatial datasets are now under copyleft, so naming the field "copyright" just seems wrong to me.

tomkralidis commented 7 years ago

Can the rm:rights element cover attribution as well?

nyalldawson commented 7 years ago

Can the rm:rights element cover attribution as well?

Yes - ignore my comment - I think it's fine without a dedicated attribution/copyright tag

mj10777 commented 7 years ago

Important is only that a User does not get into trouble by any infringement of copyright laws. It should be clear where to place this information

Can the rm:rights element cover attribution as well?

Sounds good for me together with some sort of licence tag.

ninsbl commented 6 years ago

Seen https://github.com/qgis/QGIS/pull/5467 which includes storage of alias and comments for fields / attributes. That important aspects of metadata from my point of view. Just to make sure it is considered here as well...

mj10777 commented 6 years ago

Question to the use of QgsLayerMetadata

My assumption is that it a Provider would be a major source to gather the needed information an set QgsLayerMetadata. I am doing this now for the new QgsSpatiaLiteProvider.

QgsDataProvider does not contain (at present) a mMetadata; member as QgsMapLayer does.

So when setDataProvider runs in QgsVectorLayer and QgsRasterLayer, any Metadata gathered by a provider (in the form of QgsLayerMetadata) cannot be set.

So adding a metadata() and setMetadata(..) in QgsDataProvider would be needed to make QgsLayerMetadata truly usefull.

Mangoesmapping-GeorgeCorea commented 5 years ago

Is it possible to also include a list of all the fields in the file and the value and number of unique values in some key fields that the user can select into the metadata? Annotation 2019-07-26 111753

Additionally is anyone creating a stylesheet that can be used to format the qmd into html.