hmpf / easydmp

MIT License
7 stars 2 forks source link

Feedback on easyDMP #36

Closed philippconzett closed 2 years ago

philippconzett commented 6 years ago

Please feel free to split these remarks up into multiple (GitHub) issues.

GENERAL REMARKS (For more specific feedback, see section SPECIFIC REMARKS below.)

SPECIFIC REMARKS

Q1.4 Will your project reuse data produced by another project?

Or rather "... existing data"?

Q1.5 Please specify where the data originates from. Derived covers data from simulations, models, etc. Experimental covers data arising from experiments made on objects (such as particle physics, chemical reactions etc). Observational covers data from observation of objects (such as earthquakes, sunspot activity, etc).

What about research on human beings? This explanation is somewhat natural science biased.

Q1.6 Please estimate the total amount of data your project will produce.

Missing option "Not sure / do not know."

Q2.1.1 Will you use metadata to describe the data?

The answer depends on funder requirements. Many funders demand documentation of data.

2.1 Making data findable, including provisions for metadata Please select all metadata standards that will be used

Researchers may be aware of subject-specific metadata standards, but hardly know Dublin Core citation metadata. Such information will be available automatically if one first specifies the archive one is going to use.

2.1 Making data findable, including provisions for metadata Where will the metadata be stored? Please provide the URL for the registry that you intend to host your metadata.

Is "registry" the same as archive? Do researchers understand this?

2.1 Making data findable, including provisions for metadata Please select all relevant standardized vocabulary

Where are the ISO standards?

Q2.1.1b Will you make the metadata available free of charge?

Depending on choice of archive.

Q2.1.1c Will your metadata be harvestable?

Does a "common" researcher know what harvesting is?

Making your metadata harvestable by providing, for example, an OAI-PMH interface makes it easier for other metadata registries to store your metadata increasing the exposure of your work.

I doubt that a "common" researcher knows what OAI-PMH is.

Q2.1.3 Will you provide clear version numbers for your data?

Is this referring to active or archived data?

Q2.1.4 Will you provide persistent identifiers for your data? Persistent identifiers are independent of the physical location of the data. Using persistent identifiers means you have the freedom to move the data to new storage without impacting your users.

  1. "Common" researchers do not know what PIDs are.
  2. The answer to this question is based on the choice of archive.

The data will be issued with DOI identifiers once the data has reached an approved level of maturity for consumption by interested parties.

Clumsy wording. Usually, data get a PID when they are deposited into an archive.

Q2.1.5 Will you provide searchable metadata for your data?

The answer depends on the choice of archive.

Q2.1.5a What services will you use to provide searchable metadata?

Not understandable for a "common" researcher. Depends on choice of archive. Researchers choose archives, not "service[s] [...] to provide searchable metadata".

Q2.1.6 Will you use standardized formats for some or all of your data?

How does this question relate to the earlier question(s) on file format?

Q2.1.7 Are the file formats you will use open?

  1. Do researchers know what this is, and if the formats they use are open?
  2. The question is rather whether they archive data in open formats?

Q2.1.9 Will you provide metadata describing the quality of the data?

Shouldn't this come right after the other questions about metadata / documentation?

Q2.2.1 Are there ethical or legal issues that can impact sharing your data? You should consult your research office to understand if your data is subject to these constraints.

Here, one could have integrated a tool like DataTags.

Q2.2.2 Will all your data be openly accessible?

Depends on funder requirements.

Q2.2.3 How will the data be made available?

Shouldn't this be related to the question of where metadata will be made available?

2.2 Making data openly accessible Q2.2.4 Is the storage sufficiently secure for your data? Sufficiently secure storage will provide access control for non-public data and will provide backup and recovery procedures. Ensuring your data is secure will ensure your data remain accessible during your project and afterwards. The data will be stored in a repository that is sufficiently secure, but does not provide backup or recovery of the data. The data will be stored in a repository that is sufficiently secure and that will ensure the data is backed up and can be recovered. The data will be stored in a repository that is not sufficiently secure and that does not provide backup or recovery procedures. The data will be stored in a repository that is not sufficiently secure, but that provides backup and recovery procedures.

The section is about accessibility. But safe storage is mostly about active data? Differentiate between storage of active data, and archiving of data at the end of the project. Are those two mixed up here?

Q2.2.5a Please provide links describing the documentation for accessing your data.

Not sure if I understand this question.

  1. Links do not describe, but link / refer to documentation.
  2. Such documentation is available in the archive where data is archived? Can we combine this question with the question(s) about where metadata and data will be archived?

Q2.2.5b Please provide links describing the tools for accessing the data.

See previous remark.

Q2.3.1 Will you use a standard vocabulary for your data types?

  1. In what way does this question differ from earlier questions about metadata vocabulary?
  2. Does a "common" researcher know what this means?

Q2.3.1.a Will you provide a mapping to more commonly used ontologies?

See previous remark.

Q2.4.1 What internationally recognised licence will you use for your data?

Depends on the choice of archive and/or funder requirements. Alternatives "None", "Do not know" are missing. Crucial licenses like CC0 are missing.

Q2.4.2 When do you plan to make your data available for reuse? Even after your project completes your data may still have value to fellow researchers. You should consider providing access to other researchers once the data is no longer of primary value to you.

  1. Depends on funder requirements.
  2. Strange wording "Even after ...". Sharing data at the end of a project will for many researchers be one of the most important points in following good practice for open science. The way this explanation is formulated, one might get the impression that the previous questions are dealing with something quite different.

Q2.4.3a Please input url (s) to your documented QA procedures.

Shouldn't this question be related to the questions about where data and metadata will be made available?

Q2.4.4 Will you provide any support for data reuse? Providing documentation and a contact person can help researchers make better use of your data and increase its usefulness.

Belongs to the question(s) about documentation.

Q3.1a Will the project make use of national infrastructure? Using national infrastructure for managing your data and metadata can often help to offset the cost of managing your data. The infrastructure will be familiar with the requirements for managing data which should reduce the burden on your project.

This question + choice of archive + info on funder should be included earlier. This is information that in many cases determines the answer to many of the other questions that are asked in this form.

Q3.1b Will the project make use of institutional infrastructure? Institutions increasingly provide data management resources (for example, storage with backup, metadata databases) for research projects that may help reduce the burden of data management for your project.

See previous remark.

Q3.3 How do you intend to ensure data reuse after your project finishes? Storing your data in an archive where it will be safely managed and findable will ensure researchers can use the data once your project completes. It can offset the burden of managing your data over the long-term.

Haven't we answered this previously?

Q4.1 What do you plan to do with research data of limited use? Some of the data you collect will not be of interest to researchers intending to use your data. Typically these data are raw data (such as raw signals from an instrument) that need to be transformed with software into more meaningful objects.

  1. I'm not sure about how well founded the claim is that raw data often are uninteresting for other researchers.
  2. Strange wording "of limited use". Do you mean "usefulness"?

Q5.1 Are there other ethical aspects that your data are subject to that have not been covered by previous questions? You should consult your institution and funding agency guidelines on ethical aspects and document whether your data are subject to those guidelines.

Similar questions popped up a little earlier in the form. Why not group questions regarding ethics and privacy in together?

Q6.1.1 Do you make use of other procedures for data management? Your project may also be subject to further guidelines defined by your institution or funding agency.

That's precisely why one should ask for this kind of information early in the form.

hmpf commented 6 years ago

Thank you very much for the detailed remarks! Here's some more details on some of it.

There are several templates, but for political reasons a branching H2020 one is the only one visible at the moment. A linear template for H2020 is being designed, by @adilhasan, right now. We are aware of some of the pain-points in the existing one. We wound up writing our own form-generator for this, how questions can be designed/asked are limited by what we can auto-generate.

It is possible to make new templates, but it is not very user friendly (I think), and only available for superusers at the moment. There's a TODO for making a better UI (we have a long list of non-public TODO-items) with a high priority and scheduled for this year.

Nobody has brought up institutional access before this, thank you very much again!

We have worked on speeding up the codebase, the big remaining problem now is the calculation of the next step. When you press "Next", the next page is calculated from what you just answered, which involves a (real-time) FSA in the worst case. We're working on a simplified track for branch-free (fully linear) templates but that is not in production yet.

Drop-downs and and multi choice fields are roughly of two kinds: made by hand or auto-generated via APIs for various repositories, like Re3Data. The former is very easy to adjust, and we are always looking for additional (or better) repositories-with-APIs for the latter.