intel / dffml

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
https://intel.github.io/dffml/main/
MIT License
247 stars 136 forks source link

docs: examples: innersource: Org health and issue prioritization #1287

Open pdxjohnny opened 2 years ago

pdxjohnny commented 2 years ago

@yashlamba @sakshamarora1 and I talked about this last night in relation to @builtree

As a reminder, we are interested in InnerSource and the OpenSSF identifying security threats work (https://github.com/ietf-scitt/use-cases/issues/14) being done in the open source community because it ties back to our initial mission with open source dependency security / maintainability analysis. This work also helps our project (dffml) as we scale out to understand how we can work more effectively across our plugins and the main package. This work also relates to coming up with ways to help the builtree organization scale and be more effective.

We talked about the existing InnerSource demos and how the goal of InnerSource as is it to enable more frequent changes of higher quality with increased reuse and contribution happening across an organization (such as builtree). We talked about how the one could use the CII best practices badging program as metrics, example.

Metrics can also be used to help projects get visibility so that others are aware of their feature sets when they need to reuse a project or aspects of a project. Feature set collection is also useful for analysis of what project efforts need to be joined/consolidated due to similar feature sets (for example the same set of functions in two python projects utils.py).

Metrics can also be about people, this person is at skill level 0-5 for JavaScript, our model could tell us that investing a mentors time in helping them learn JavaScript would accelerate development across projects faster than that mentor talking another issue. (Increase in overall production output, civilization terminology)

builtree has several projects which have potential cross project collaboration (we talked about dffml and the secret vault for the generic CI/CD work). Identification of feature sets helps determine where more opportunities for cross project collaboration exist. it also helps when contributors want to propose a new project, to identify if a similar project already exists. This ties into the inventory.

We talked about how investing in driving up metrics now (before GSoC season) will pay off big in terms of less manual review needing to be done and making contributors more autonomous and enabled to do great work (quality docs, tests, etc.).

We should consider creating some kind of org level health metrics. So that we can see the reuse and contribution happening from maintainers and contributors across projects (cross pollination, metric which describes health of set of projects, potentially the set of projects is the whole org).

We also talked about implementing InnerSource for an open source org could help senior folks in an org prioritize what they should work on when they have a role to help all the projects across the org when needed. Having this traceability and data on potential impact of work (this ties in with the estimating time to complete issue stuff from #1279) allows org admins / maintainers to make data driven decisions about what to work on. They will have a model to see that closing issue X increases throughput or quality, or both across all projects in the org by Y amount (time to close other issues shrinks) as estimated by the model / data flow.


The presentation we did at bsides pdx 2019 shows how we developed methodology for an event driven CI/CD pipeline which yielded insights into open source project maintenance and security posture. The work in this issue is a potential avenue to another presentation. One where we show how we continued work on our CI/CD + ML project to gather further insights on open source projects, including our own. Then though application on InnerSource principles within our orgs, we use those insights to build more effective organizations.

pdxjohnny commented 2 years ago
pdxjohnny commented 2 years ago
pdxjohnny commented 2 years ago
pdxjohnny commented 2 years ago

GSoC project idea: Intuitive documentation practices as a vehicle for modular development

pdxjohnny commented 2 years ago
    graph TD

      subgraph system_context[System Context for Ideation]

        requirements_management[OWASP common-requirement-enumeration<br>&#91Software System&#93]
        data_storage[oras.land<br>&#91Software System&#93]
        source_control[Source Code Management Forgejo<br>&#91Software System&#93]
        engineer[Software Engineer<br>&#91Person&#93]
        manager[Project Manager<br>&#91Person&#93]
        project_idea[Project Idea THREATS.md<br>&#91Document&#93]
        ci_software[Continuous Integration<br>&#91Software System&#93]
        cd_software[Continuous Deployment<br>&#91Software System&#93]
        iaas[Infrastructure as a Service<br>&#91Software System&#93]

        project_idea -->|Understand requirements| requirements_management
        requirements_management --> manager
        manager -->|Communicate priority of tasks| engineer
        engineer --> source_control
        source_control --> ci_software
        data_storage -->|Pull dependencies from| ci_software
        iaas -->|Provide compute to| ci_software
        ci_software -->|Validated version| cd_software
        cd_software -->|Store copy| data_storage
        cd_software -->|Measure alignment to| project_idea

        class manager,engineer,project_idea Person;
        class innersource NewSystem;
        class ci_software,cd_software,requirements_management,source_control,data_storage,iaas ExistingSystem;

      end

      class system_context background;
pdxjohnny commented 2 years ago

Map tags in github/gitlab/etc. via clustering models to understand which repos are which types of software. Then map to skills of developers to map workstreams for strategic efforts developers want to be involved in. Diff complexity.

pdxjohnny commented 2 years ago

Idea: model projects as k8s resources vis kcp: https://github.com/kcp-dev

pdxjohnny commented 2 years ago

https://cloud.google.com/architecture/devops/devops-process-team-experimentation

With regards to forcing a single way of doing things on teams - 'This approach only helps poorly run teams - to some extent. Well-run teams typically play along with the process theatre, but try to keep their old way of working. If they can. If they cannot: their output + motivation drops. [...] "The same way “no one is fired for buying IBM”, it’s true that “no one is fired for introducing Scrum/SAFe”. But better engineers/managers sure do leave because of it: to places where team autonomy is a given. Take away autonomy: you’ll never have a world-class engineering team."' - https://mobile.twitter.com/GergelyOrosz/status/1519948934457831431

How does InnerSource relate to this? InnerSource facilitates discovery of team experimentation success stories and provides BKMs which enable easy leveraging of wins across teams. InnerSource is our methodology for actively seeking out and proliferating team successes. It is also the way we measuring the impact of this methodology itself.

pdxjohnny commented 2 years ago
@dffml.config
class BadgeURLConfig:
    tiers: Dict[str, str] = field.config(
        "Mapping of state to URLs for .svg badge",
        default_factory=lambda: {
            True: "https://img.shields.io/badge/Maintainance-Active-green",
            False: "https://img.shields.io/badge/Maintainance-Inactive-red",
        }
    )
pdxjohnny commented 2 years ago

Explore definition of ALIGNMENT.md, did a search (https://duckduckgo.com/?q=inpath%3AALIGNMENT.md%E2%80%8B&t=canonical&ia=web) and looks like someone did one: https://github.com/confidential-containers/documentation/blob/main/ALIGNMENT.md

This is a very text based lenses into what we want to do, probbaly links to repos, maybe even built of the strategic direction which THREATS.md shows.



The above look closely related to Alice, aka the system context, aka the upstream the overlay and the orchestrator, the past, the present and the future

pdxjohnny commented 1 year ago
pdxjohnny commented 4 weeks ago

Upstream First Development in Federated Software Forges

prompt(model:gpt-4o): Explain what upstream first is: Explain how federation of software forges (such as gitea and gitlab and github, similar to how threads and mastodon federate via activitypub) which form ad-hoc organizations in pursuit of shared strategic principles and plans can practice upstream first development using github pull request style workflow. Return your response as a markdown file which will be included in the documentation for the forgejo project. Please go into detail. https://chatgpt.com/share/bc6fc7c3-aa76-4518-9d41-0d5107b92f52

Introduction

In a federated environment where various software forges like Gitea, GitLab, and GitHub interact similarly to social platforms like Threads and Mastodon via ActivityPub, the practice of "upstream first" development can be efficiently managed. This document outlines how these federated forges can form ad-hoc organizations to pursue shared strategic principles and plans while adhering to the upstream first development model using a GitHub pull request style workflow.

Key Concepts

Federation of Software Forges

Federation allows independent software forges to communicate and collaborate seamlessly. Each forge retains its autonomy while contributing to a larger ecosystem. This is similar to how social networks federate, enabling users from different platforms to interact. Federation in the context of software development means:

Upstream First Development

This development practice involves contributing changes, enhancements, or bug fixes to the original project (upstream) before integrating them into downstream projects or forks. It promotes:

Workflow for Upstream First Development

1. Identify and Plan the Change

Shared Strategic Principles and Plans

Federated forges often form ad-hoc organizations to pursue common goals. These organizations should identify strategic principles and plans guiding their development efforts. These principles might include:

Issue Tracking

Use a federated issue tracking system to identify and discuss the change needed. Each forge can contribute to the discussion regardless of where the issue was originally reported. This can be managed through:

2. Develop the Change

Local Development

Developers work on their local forks or branches of the project. Ensure the changes align with the upstream project's guidelines and standards. Best practices include:

Collaborative Effort

Utilize federation capabilities to collaborate with developers across different forges. Tools like federated code review and discussion can be beneficial here. Techniques include:

3. Submit a Pull Request (PR)

Upstream Repository

Once the change is ready, submit a pull request to the upstream repository. This is done in the forge where the original project is hosted. Key steps include:

Federated PRs

Federated forges should support cross-forge pull requests, allowing a pull request from a repository on one forge to be reviewed and merged into a repository on another forge. This involves:

4. Review and Collaborate

Code Review

Engage in a thorough code review process. Federated forges can allow reviewers from different platforms to comment and approve changes. Best practices include:

Continuous Integration

Utilize CI/CD pipelines that work across federated environments to ensure changes meet quality standards. This involves:

5. Merge and Integrate

Upstream Merge

Once the pull request is approved, it is merged into the upstream project. This ensures the change benefits the entire community. Steps include:

Downstream Synchronization

Downstream projects can then pull the latest changes from the upstream project. This keeps all forks and derivatives up-to-date with the latest improvements. This involves:

6. Communication and Documentation

Federated Communication

Use federated communication tools to keep all stakeholders informed about the progress and integration of changes. This includes:

Documentation Updates

Ensure that any changes are well-documented. Documentation should be accessible across all federated forges. This involves:

Benefits of This Workflow

Conclusion

By adopting an upstream first development practice in a federated environment, software forges can collaborate more effectively and ensure that improvements benefit the broader community. This workflow, modeled on the GitHub pull request style, leverages the strengths of federation to create a more cohesive and high-quality software development ecosystem.

For more details on federation and how to get involved, visit the Forgejo Project.