nbeliy / bidsme

GNU General Public License v2.0
3 stars 0 forks source link

default readme in BIDS dataset after conversion #8

Open Remi-Gau opened 2 years ago

Remi-Gau commented 2 years ago

hey

We (BIDS maintainers) are trying to see if we can improve the quality of the average README in BIDS dataset. Those tend to be pretty thin and are missing some information we would like to see there even if it can be formalized "yet" into the BIDS specification.

Hoping to leverage the bids converters to help to do.

So I am going around the different BIDS converters repo to see if we can start improving the default README they put in a the BIDS dataset.

There is template that is provided in the BIDS starter kit: https://github.com/bids-standard/bids-starter-kit/blob/main/templates/README.MD

Some possible options:

  1. use this template is
  2. use a subset of it
  3. add a link to the template README
  4. something else (a mix of 2 and 3?)

What do you think? I can try to open a PR if you want.

Also let us know if you think that the template README in the starter kit could / should be improved .

Feel free to disregard if you are already using some version of the template README.

nbeliy commented 2 years ago

The readme is indeed a big problem. From my experience, is the latest and the optional thing users do (usually by just putting the name of dataset and nothing else). Putting warnings doesn't help much.

The template (not in current state) could help, but I'm not very optimistic, I have trouble to force the sidecar json files not in the minimal state. The best solution would be to become more severe in the accepting datasets for publishing (including in bids-examples). For internal use it would be unrealistic to force a perfect and complete readme, but for publication people are ready to put a lot of efforts. Maybe write a guideline for reviewers?

Concerning the existing template, it is confusing, starting with general description of what the readme is. It must be either a description of information that should be in the file (and then it should be in bids standard), or a pure template with text to fill.

I would propose a list of sections with text to fill, in easy to read format, for example the quotes and italic must be replaced by user. Something like:

My dataset

General description of dataset, it's inended goal and attached analysis papers

Data information and contacts

Condition of data usage

Contact person

Indicate the name and contact details (email and ORCID) of the person responsible for additional information.

Practical information to access the data

If there is any special information related to access rights or how to download the data make sure to include it. For example, if the dataset was curated using datalad, make sure to include the relevant section from the datalad handbook: http://handbook.datalad.org/en/latest/basics/101-180-FAQ.html#how-can-i-help-others-get-started-with-a-shared-dataset

Remi-Gau commented 2 years ago

The readme is indeed a big problem. From my experience, is the latest and the optional thing users do (usually by just putting the name of dataset and nothing else). Putting warnings doesn't help much.

The template (not in current state) could help, but I'm not very optimistic, I have trouble to force the sidecar json files not in the minimal state. The best solution would be to become more severe in the accepting datasets for publishing (including in bids-examples). For internal use it would be unrealistic to force a perfect and complete readme,

Agreed though optimist me thinks that some people would actually take the time to add info in the README if we told them what are the typical type of info one should put in there.

Maybe write a guideline for reviewers?

That is a good point actually

Concerning the existing template, it is confusing, starting with general description of what the readme is. It must be either a description of information that should be in the file (and then it should be in bids standard), or a pure template with text to fill.

I would propose a list of sections with text to fill, in easy to read format, for example the quotes and italic must be replaced by user. Something like:

My dataset

General description of dataset, it's inended goal and attached analysis papers

Data information and contacts

Condition of data usage

Contact person

Indicate the name and contact details (email and ORCID) of the person responsible for additional information.

Practical information to access the data

If there is any special information related to access rights or how to download the data make sure to include it. For example, if the dataset was curated using datalad, make sure to include the relevant section from the datalad handbook: http://handbook.datalad.org/en/latest/basics/101-180-FAQ.html#how-can-i-help-others-get-started-with-a-shared-dataset

Yeah I am not super happy with the current template but that's also why I going around asking people for feedback.

Your suggestion makes me think that I could try to add a page to the staterkit notebook that acts as a form to help create a good README: would help make a better distinction between the actual README content and our explanation about why a readme is important or what it should contain.

For bidsme wecould start with the lightweight template you suggest and a markwon comment in it pointing to the starter kit?

nbeliy commented 2 years ago

I think that it's dangerous to include template into bidsifiers. So the solution is to check if readme is already present (already implemented), and if not, just copy a template under the name README.template.md.

With different name it would be more difficult to just ignore the template.