openkfw / open-geodata-model

Open Geodata Model for Mapping Project Sites in ODA
https://openkfw.github.io/open-geodata-model/
Other
3 stars 8 forks source link

Proposition to add an extra sheet to collect individual project information to the Excel Template #74

Open Jo-Schie opened 1 week ago

Jo-Schie commented 1 week ago

We are frequently asked by the projects if it was possible to extend the location model and add additional information for specific use-cases, sectors, projects etc. We are so far hesitant to do (and allow) that, because we do not want to blow up the model and make it unusable or charged with too much information that people might interpret as obligatory for specific sectors.

One easy and convenient way to circument this problem could be to create a new sheet called e.g. "project specific information" in the Excel Template. This sheet could duplicate some of the information from the "fill-me" sheet such as project number, project name, location name, location type, etc... and then projects or sectors could add their own columns with information they need to collect regarding this location anyway.

This would help us to better deal with the Excel because we could simply protect the "fille-me" sheet so that people who fill it out can not mess this up (which will facilitate processing as well) and understand, that these are the "minimum requirements". In addition our projects and sectors would be free, to add extra information which can be linked again with the locations, if the want to (voluntarily). The extra columns would not be published here, at least if there was no agreement that this information should be collected for all projects from a specific sector anyways (exeption use case is e.g. now the protected areas where the reporting needs to be extended) Like this we will not overwhelm people with additional requests but give projects the flexiblity to add information.

I already discussed this via phone with @Maja4Dev but I would like to also hear your opinion @fretchen and @karpfen and @goergen95 . Btw. lets not discuss Excel vs. other options here but rather focus on how we can improve the template and situation in the short term. Other format etc. should be developed as well of course.

fretchen commented 1 week ago

I think that get where the request comes from but we should be super careful with adding such options as I fear that this can easily become a complete mess. Before I can form myself an opinion I wonder about the following points:

1.) There is already a number of voluntary columns in the fill-me sheet. What makes those columns different to the voluntary columns in the other sheet ? 2.) What kind of data would you repeat in the second sheet ? Keeping this consistent sounds quite harsh no ? 3.) Would a simple "open" text field not cover already quite a number problems that people have ?

So I am a bit undecided between three options:

Option A: Fill-me only

Keeping a single fill-me sheet where it is clearly marked that you can add additional columns after column Y.

Nice thing is that it keeps everything in one place. Not so nice is that people might mess around with this.

Option B: Add a "DIY" sheet

This is what you described. However, I feel that it mixes obligatory and non-obligatory and your own fields. So I think that it can be really confusing.

Option C: Add an optional sheet

This would separate the obligatory fields in the fill-me from the non-obligatory ones. And in this optional sheet you could then also add your own favorite column.

This might be a nice compromise as it "secures" the core data (which is not given in option A) and it is more clearly structured than option B. Even nicer: It makes the fill-me easier to handle as you see less columns when you first encounter it... And as I write this I wonder if this might even be a step towards a solution of #10

Summary

I see the point and think that we might try something like this together with users. My (slight) current favorite is option C as it separates the core data from the "rest".

goergen95 commented 1 week ago

To me this sounds very much like an extension to the base standard, which definitely is possible, in general terms. But since we are not yet able to automatically validate if incoming data adheres to (any version!) of the specification, allowing this in the current situation only adds more complexity, sources of error and, frankly, confusion, if we do not start to take care of the basics.

To make it clear: I am not arguing about the formats data is delivered in, but how the tools around the specification are build. With the danger of repeating myself, I think that we require the data specification in both a machine- and human-readable form (say JSON or YAML), checked into version control, distributed with tooling to validate incoming data from various formats (e.g. Excel, CSV, GeoJSON). Again, there is prior-art to this (see the fiboa project) and we also have to take into account that we are specifically aiming at the collection of geo-spatial data, which only adds to the complexity.

In summary, my point on this issue is that we need to focus on the basics here before letting this "grow naturally". There are technical solutions to these problems, but we have to actively seek and adopt them. How to represent such extensions in Excel (or other formats) then becomes an "implementation detail", but we really have to take care of the basics first.

Jo-Schie commented 1 week ago

thanks for your comments @goergen95 and @fretchen .

For clarification regarding validation and use @goergen95: I do not forsee that we validate and import this kind of data in an automated way anytime soon as this would require, from my point of view, that we manage and handle "extension" wishes on a project level. This is way out of scope of what we can do. The idea is that an automated import and validation is only possible for the base data model in the near and medium future.

Also, I do not see how Excel would be appropriate tool for handling such complex cases in the future. I rather see it in the responsiblity at the side of operations to use the additional data that they collect on their own. This will be also clearly communicated. They might seek help from GIS trained staff and we can also create a tutorial on how to convert and append this data in QGIS.

Nevertheless people frequently ask us to enable them to do that so I think we should not put a blocker to this.

Edit: Don't get me wrong. Seeking a technical solution for this should be the way to go but currently we have so many things to do first i.e. getting the basic functionality up and running that I fear we simply do not have the capacities for that. It is unfortunate but at least in the short term have to live with the legacy that we decided to use Excel as the favoured option. This does, however not include our work to create a json schema for the "base data" that we manage. I think this is something that is also already work in progress by @fretchen . This may then also be the base to eventually extend the standard at least for such cases that are not project but rather sector specific and that people can agree upon.

As to @fretchen comments:

goergen95 commented 1 week ago

This is way out of scope of what we can do. The idea is that an automated import and validation is only possible for the base data model in the near and medium future.

I am not 100% sure this is true. It is definitely out of scope with what we currently have, yes, but isn't that what a data specification should deliver? Extensibility should be built-in by default - there will always be sectors/people wishing to adapt it to their needs. Also, the world is constantly changing and we cannot anticipate future requirements. That is why we should embrace open standards and modularization. To get started, however, (and I totally agree with you on that) we have to focus on the core specification. We have to get that right, and growing the specification from there should then "feel easy" as opposed to now, where every minor change leads to substantial friction.