research-software-ecosystem / content

A metadata commons to store research software metadata
Creative Commons Attribution 4.0 International
40 stars 28 forks source link

Updated qiime2.biotools.json manually #638

Open matuskalas opened 12 months ago

matuskalas commented 12 months ago

There are multiple reasons for this:

❗❗❗ This also points to the fact that the diff is completely useless here, rendering the whole GitHub workflow with PR reviews impossible 😥😥 (at least not if JSON editted outside of Bio.tools) @bgruening @hmenager

bgruening commented 12 months ago

I would not be too worried about the diff. This only tells use that we need a unified formatting of JSONs. Than the diffs look much better. So we need to autoformat json with a common schema.

matuskalas commented 12 months ago

I would not be too worried about the diff. This only tells use that we need a unified formatting of JSONs. Than the diffs look much better. So we need to autoformat json with a common schema.

Exactly! But how to do that? Do you have experience with it / working examples?

E.g.: How does it work with the XMLs in Galaxy and the Conda recipes? If they are editted only manually, it's not an issue. But the combination of graphical editting in a tool, and manual editting as text, causes troubles. This problem is of course enormous in EDAM 😟

hmenager commented 11 months ago

Hi @matuskalas @bgruening,

Thanks a lot for reporting this. There are three different matters here:

  1. the pull request cannot be usefully merged here, modifications would be automatically erased by the next weekly synch. only logical way to solve this is to have bidirectional sync between bio.tools and RSEc.
  2. the bug with the validation on bio.tools can be investigated, thanks for reporting.
  3. as for the diff, I recommend (and use myself) jq, a small utility that can do lots of things with json. one example is completely resorting json files. We use it in RSEc to reformat automatically all json data from bio.tools, in order to minimize the diff, make them more readable, and minimize (a tiny bit) the growth of the git repo. Command for this is: jq --indent 4 'walk( if type == "array" then sort else . end )' -S [FILE] This last line sorts all arrays and object keys, and reformats all white space. This means that for the same contents the format is predictable (a bit like the black tool for python formatting). Using this to reformat properly the json, the diff goes from 944 lines to 126 lines in this case, which is a lot easier to read.

As for EDAM, this should probably be discussed in an issue on the repository itself. One thing I would advocate is adopting the same strategy, but for an RDF-XML file. This could be harder, because there are "rules" for the formatting of EDAM in order to maintain its readability in a raw text format.