linkml / linkml

Linked Open Data Modeling Language
https://linkml.io/linkml
Other
326 stars 101 forks source link

Add a how-to guide on successful collaborative data model development #1744

Closed cmungall closed 9 months ago

cmungall commented 1 year ago

The LinkML ecosystem attempts to encourage best practice for collaborative schema development - for example, the schema cookiecutter sets you up with standard CI workflows to encourage contributions via PRs, together with default CONTRIBUTING.md, CODE_OF_CONDUCT.md, etc.

However, we would benefit from a more direct narrative guide. the best place may be in the howto section (but FAQ entries also welcome)

There would be no need to write this de-novo. It would heavily reference and link out to other guides, in particular

Although we would heavily reference rather than duplicate, we could include some concrete examples of docs in existing schema or data modeling projects:

However, we should recognize that different kinds of projects call for different kinds of processes. A small schema designed primarily to support a single data portal does not need processes for creating working groups. A project that needs direct input from a large number of non-technical SMEs may need to insulate from technical GitHub interfaces

O3 principles

Here's a summary of each point in the provided text as bullet points (from ChatGPT):

Obook Open Science Engineer guide

The document "Maximising impact as an open science engineer - OBO Semantic Engineering Training" outlines principles and practices for effective collaboration and impact in the field of open science engineering. Here's a summary:

The document also includes a TL;DR summary with key takeaways:

These principles and practices are aimed at fostering a more collaborative, efficient, and impactful open science community.

TisLab guide

The document outlines several common pitfalls encountered in transdisciplinary and geographically distributed research teams. Here's a bullet list summarizing each pitfall:

These pitfalls highlight the complexities and challenges of managing large, diverse, and distributed research teams, underscoring the need for effective management and communication strategies.

Best practices:

The document outlines several best practices for managing transdisciplinary and geographically distributed research teams effectively. Here's a summary of these practices:

These best practices are designed to foster a cohesive, efficient, and respectful working environment within diverse and distributed research teams, thereby enhancing productivity and team satisfaction.

Bioschemas governance

The document titled "Bioschemas Governance" outlines the governance structure and guidelines for the Bioschemas community, a project aimed at improving data interoperability in life sciences. Here's a summary:

This document provides a comprehensive guide to the governance structure, roles, processes, and best practices within the Bioschemas community, emphasizing open collaboration, transparency, and adherence to established standards.

schema.org how we work doc

The document titled "How We Work - Schema.org" provides an overview of the processes and practices employed by Schema.org in developing and updating its schemas. Here's a summary:

This document serves as a comprehensive guide to the operational framework, versioning system, and collaborative nature of Schema.org's efforts in structuring and standardizing web data.

cmungall commented 1 year ago

Current docs live (but not linked): https://linkml.io/linkml/howtos/collaborative-development

Pull requests welcome on:

cthoyt commented 1 year ago

@cmungall FYI we're going to rename from O3 Principles to O3 Guidelines. I can send a PR later or feel free to update

nlharris commented 11 months ago

Permissively License Your Code and Data

(adapted from O3 guidelines)

Using recognizable, permissive licenses (e.g., CC0, CC BY) encourages contribution and ensures content longevity. Non-permissive licenses or custom terms can hinder reuse and engagement. Permissive licensing doesn't typically lead to a lack of credit for the original resource.

Given our recent discussion, should we point out that CC0 and CC BY are good for non-code resources (or maybe also for projects that include code and non-code resources) but that code should be licensed with a software license like Apache 2.0 or MIT?