samtools / htsjdk

A Java API for high-throughput sequencing data (HTS) formats.
http://samtools.github.io/htsjdk/
283 stars 242 forks source link

HTSJDK requires defined processes and governance #871

Open tfenne opened 7 years ago

tfenne commented 7 years ago

HTSJDK is an open source project within the samtools organization that grew out of work performed by many individuals presently or previously at the Broad institute. The project has been operating informally since it's move to GitHub, and has no defined processes, no agreed upon ways to make decisions or resolve conflicts, etc. While the vast majority of work was being performed by individuals employed at the Broad this more or less worked. However, over the last few years things have changed. While the Broad still makes significant contributions to HTSJDK, several key contributors have left Broad (myself, @nh13), the CRAM team at EBI have become core contributors, and at least two people with no Broad affiliation have become significant contributors ( @magicDGS & @lindenb ).

As one of the largest contributors to the project over time, I find the lack of a clear decision making process makes me less and less willing to put in any effort. I would personally like this resolved, and I believe it would be greatly beneficial to the project to have clearly defined leadership and processes.

The project has five admins current: @alecw, @nh13, @tfenne (me), @ktibbett and @bradtaylor. In addition there are 32 members with write access, though many are not active contributors.

I propose two possible paths forward:

  1. HTSJDK - A Community Project: on this path we accept that HTSJDK was founded at Broad but that the contributorship is wide enough that it should no longer be controlled by the Broad. I would suggest that the existing project admins engage the remaining contributors, and are then responsible for developing or adopting a process for deciding who can be an admin, a member with write access, and how changes get approved and merged into the project.
  2. HTSJDK - A Broad Project: on this path the Broad takes more explicit ownership of HTSJDK. I would suggest that @nh13 and myself relinquish admin rights, and perhaps even that all non-Broad employees lose write privileges (i.e. have to fork and make PRs from forks). It would then be incumbent on Broad to select admins, and provide some guidance on how to contribute to the project (such as they do for GATK 4 development.

While I would personally prefer option 1, I absolutely believe that either option is better than where we are now.

Since I do not believe that the current set of admins is a good representation of those with stakes in the project, it's not even clear how to begin this. Perhaps the first question is, to those of you at Broad, what is your reaction to the two paths laid out above?

alecw commented 7 years ago

Hi @tfenne ,

I agree that some clarity about the process would be good. I have hardly touched HTSJDK in > 2 years, so I think I should no longer be an admin. Also, I think it's not up to me how to proceed. I don't know what the GATK/Picard team has in mind.

-Alec

d-cameron commented 7 years ago

I couldn't agree more. If would be nice to know exactly what hoops one needs to jump through to contribute. The issues I had getting https://github.com/samtools/htsjdk/pull/576 accepted are a good example of the problems with the current state of affairs. Yes I can contribute to a nominally public open source project, but my contributions are conditional on a whole bunch of tests passing in non-public projects I have no visibility of.

yfarjoun commented 7 years ago

👍

magicDGS commented 7 years ago

I vote for the first one too, and even if I'm starting to be repetitive with my suggestion, a brand-new htsjdk3 API and repository may be useful for clean-up the library and solve from the very beginning the issues and discussions between the teams...

tfenne commented 7 years ago

@yfarjoun (and maybe also @ktibbett ?) would you be able to raise this for discussion internally at Broad and figure out who wants to be involved, and who can represent the Broad position? I'm happy for anyone who wants to be involved to do so, but it would be great if there were 1-2 people from Broad who could officially represent Broad for this discussion?

I'd like to drive this process to completion in such a way that most or all parties see the result as an improvement, and nobody feels the need to create their own fork of htsjdk and depart long term from the official project. One fear I have is a lack of engagement from the necessary parties, resulting in either a lack of decision or a de-facto decision to go option 1 that then leads to a bifurcation of the project or a later attempt to reverse direction.

I understand that we can't resolve this in a day, but I would like to put a timeframe on this. Is it reasonable to think that by 5/19 (two weeks from today) we could have agreement on:

  1. General direction (option 1 vs. option 2 vs. something totally different)
  2. A small set of people who will take that direction and define more detailed roles, policies, etc. (with community input) and the implement?
droazen commented 7 years ago

@tfenne If your fear is "lack of engagement from the necessary parties", then you're going to have to give us at the Broad, at least, a bit more than the proposed 2 weeks to engage in this topic. As you know, we're currently too swamped to deal even with a major deprecation in the codebase due to an internal conference deadline, so you can easily believe that we're too swamped to engage in a discussion like this! However, since this discussion affects our work intimately, we'd naturally like to be involved.

I'd suggest proposing a timeline that is actually realistic given our current time constraints here at the Broad, such as 1 month. In the meantime, @yfarjoun has proposed what seems to me like a reasonable way forward on your PR https://github.com/samtools/htsjdk/pull/868.

tfenne commented 7 years ago

@droazen, a month is reasonable - that's why I asked if two weeks was reasonable. I figured I was more likely to get a response by proposing a timeline than just asking "what's a reasonable timeline".

I'd like to ask a few questions of you in order to help calibrate my own, and others', expectations. For context, while I would love to resolve this quickly, realistically I would prefer us to have a clear timeframe that we can stick to rather than an ambitous timeframe that we fail to keep. With that in mind:

  1. Are you able to represent/speak on behalf of Broad interests in HTSJDK, or are there others who also need to be directly invoved? I ask because I don't want for us to agree on a timeline only to be surprised later by someone else coming along and objecting.
  2. Is one month realistic either? I don't want to extend the timeline more than necessary, but on the other hand I'd rather set a timeline once and keep to it.
  3. Understanding that you are busy, could you give some insight into your availability to participate in the discussion? Is it effectively zero until the start of Bio IT on May 23rd, then ramping up? Or are you able to engage minimally before then? Something else?
  4. Do you have any objections to myself and others continuing the dicsussion in your absence and perhaps even drafting policies etc. E.g. I think it would be really useful for everyone who's interested to share and discuss goals for HTSJDK governance, preferences for how it might work etc, such that we are not starting from zero in 2-3 weeks.

If it seems like I'm impatient for this, it's because I am. I'm not sure how much of this you're aware of, but this has been a topic of discussion between myself and some Broad folks for more than two years. I also made an attempt about 18 months ago to engage more formally with folks at Broad to try and resolve this. That last attempt dead ended with the ball in the Broad's court. I'm not trying to lay blame; certainly I could have kept pushing more consistently. And it's hard to know, from the outside, why the last attempt fizzled. But I think it's also fair to interpret that outcome as a lack of prioritization, at Broad, of solving this particular problem . That's why I say I fear a lack of engagement, because it's how the last attempt failed. And that's why I'm trying to build some momentum and commit to a timeline that's not many months long.

I''m happy to slow things down by a couple of weeks if that's what it takes to get you engaged. I would also genuinely appreciate both or either of a thin thread of involvement sooner or openness to letting others move the conversation forward (but not to completion) and to play catch up when you're ready to join us.

vdauwera commented 7 years ago

I'd like to be involved on the Broad side. I have zero bandwidth before Bio-IT, then probably minimal bandwidth until July, but if you and others want to take the lead I'm happy to play catch up when my availability improves.

If there are any major changes in status that result we'll need to run them past our IP office but I don't anticipate any difficulties. On Sun, May 7, 2017 at 10:57 AM Tim Fennell notifications@github.com wrote:

@droazen https://github.com/droazen, a month is reasonable - that's why I asked if two weeks was reasonable. I figured I was more likely to get a response by proposing a timeline than just asking "what's a reasonable timeline".

I'd like to ask a few questions of you in order to help calibrate my own, and others', expectations. For context, while I would love to resolve this quickly, realistically I would prefer us to have a clear timeframe that we can stick to rather than an ambitous timeframe that we fail to keep. With that in mind:

  1. Are you able to represent/speak on behalf of Broad interests in HTSJDK, or are there others who also need to be directly invoved? I ask because I don't want for us to agree on a timeline only to be surprised later by someone else coming along and objecting.
  2. Is one month realistic either? I don't want to extend the timeline more than necessary, but on the other hand I'd rather set a timeline once and keep to it.
  3. Understanding that you are busy, could you give some insight into your availability to participate in the discussion? Is it effectively zero until the start of Bio IT on May 23rd, then ramping up? Or are you able to engage minimally before then? Something else?
  4. Do you have any objections to myself and others continuing the dicsussion in your absence and perhaps even drafting policies etc. E.g. I think it would be really useful for everyone who's interested to share and discuss goals for HTSJDK governance, preferences for how it might work etc, such that we are not starting from zero in 2-3 weeks.

If it seems like I'm impatient for this, it's because I am. I'm not sure how much of this you're aware of, but this has been a topic of discussion between myself and some Broad folks for more than two years. I also made an attempt about 18 months ago to engage more formally with folks at Broad to try and resolve this. That last attempt dead ended with the ball in the Broad's court. I'm not trying to lay blame; certainly I could have kept pushing more consistently. And it's hard to know, from the outside, why the last attempt fizzled. But I think it's also fair to interpret that outcome as a lack of prioritization, at Broad, of solving this particular problem . That's why I say I fear a lack of engagement, because it's how the last attempt failed. And that's why I'm trying to build some momentum and commit to a timeline that's not many months long.

I''m happy to slow things down by a couple of weeks if that's what it takes to get you engaged. I would also genuinely appreciate both or either of a thin thread of involvement sooner or openness to letting others move the conversation forward (but not to completion) and to play catch up when you're ready to join us.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/samtools/htsjdk/issues/871#issuecomment-299719289, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnwE6QBtH4z-lgBLr4u0zCH1K7D2WEtks5r3fgIgaJpZM4NP8B9 .

-- Geraldine A. Van der Auwera, PhD Associate Director of Outreach and Communications Data Sciences and Data Engineering Broad Institute

nh13 commented 7 years ago

@vdauwera I think one of the things that is unclear is the difference from individuals contributing to this discussion who happen to work at the Broad versus having an explicit person (or 1-2 people) who represent the Broad as an organization and can make decisions or the like on their behalf. I think the latter is what @tfenne is looking for, and what the question (1) to @droazen is about. Would you be able to help clarify?

droazen commented 7 years ago

@tfenne To answer your question, unfortunately neither myself nor anyone else on our side has bandwidth to participate in this discussion until June. To be on the safe side, I'd propose a start date of June 10.

I know that you're impatient to resolve this (or at least understandably nervous that the topic will get dropped again), but I feel strongly that starting the discussion without all the stakeholders having an active seat at the table and fully available to participate from the beginning would be a pretty terrible way to start out, and would not bode well for the future of the project. I'd ask that you hold off until the requested date of June 10, and in return we'll promise to stick to that date and ensure that the topic doesn't get forgotten a second time. Sound reasonable?

yfarjoun commented 7 years ago

In my opinion, the current status of non-active (@kt, @bradtaylor, @alecw) and non-empowered (@nh13, @tfenne) maintainers is unworkable. I suspect that the conflicts we have been seeing recently are due to this problem.

To me this is a "drop-everything" issue. We cannot afford to have htsjdk without active, and empowered maintainers any longer. There are folks at the broad that do have time to engage and waiting until "all the parties" have time is impossible unless folks are willing to change their schedule. Given that the maintainers have to be able to make time, I do not share the opinion that this should be postponed any longer.

droazen commented 7 years ago

@yfarjoun If we can't even agree on how to begin this discussion in a way that's fair to all parties, then that does not bode well for our ability to make collective decisions in the future! It seems to me that our request to hold this important discussion at a time that works for all major stakeholders is not an unreasonable one. Starting this discussion without a representative of the GATK project as an active participant would do great harm to our willingness to participate in htsjdk as a community project going forward.

yfarjoun commented 7 years ago

(following some internal discussion at broad) I move that there will be a new set of maintainers:

Who will start with equal veto power and set the "processes and governance" going forward.

This motion come with the implicit understanding that conflicts need to be discussed (perhaps offline) till resolution.

Given that there are currently 5 official maintainers and no clear governance model, I am not sure what the official rules are for such a change. So, in lieu of any guidance I will ask that someone second this motion and then the (current) maintainers will vote and I hope there will be no objections. (I already spoke in private with the 3 nominations and have their personal agreement to the arrangement).

I hope I'm not making too much of a fool of myself here....

cc:

@tfenne @ktibbett @alecw @bradtaylor @nh13

bradtaylor commented 7 years ago

I second this proposal. I am no longer in a position to offer administrative support to this very important software project, and I wish to be removed as an admin.

The proposed set of three maintainers are deeply knowledgable about this repository and its goals / design standards. I trust them to engage with the developer community propagate a governance model that ensures the improved health of this project and its role in the broader genomics software ecosystem.

Thank you for the suggestion @yfarjoun. This proposal has my vote.

nh13 commented 7 years ago

I agree.

vdauwera commented 7 years ago

As advocate for the GATK user community, which depends hugely on the health of the htsjdk project, I agree as well. On Tue, May 9, 2017 at 7:02 PM Nils Homer notifications@github.com wrote:

I agree.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/samtools/htsjdk/issues/871#issuecomment-300344187, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnwE0utbk2G4DBI-ZSwRiWWA-J7azPFks5r4QyVgaJpZM4NP8B9 .

-- Geraldine A. Van der Auwera, PhD Associate Director of Outreach and Communications Data Sciences and Data Engineering Broad Institute

magicDGS commented 7 years ago

I agree to speed-up this issue. As a contributor and user of the library, I would like to have this solved soon because I have several projects depending on this....

ktibbett commented 7 years ago

:+1:

tfenne commented 7 years ago

👍

lbergelson commented 7 years ago

👍

jacarey commented 7 years ago

adding my 👍

alecw commented 7 years ago

+1

On May 10, 2017, at 12:25 PM, Jay Carey notifications@github.com wrote:

adding my 👍

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

droazen commented 7 years ago

:+1:

tfenne commented 7 years ago

@yfarjoun I think I see 👍 from all the existing maintainers, and proposed future maintainers, and most of the folks who joined the conversation too. Are we ready to move forward, or were there any last people you wanted to make sure we heard from?

yfarjoun commented 7 years ago

As I understand it, the motion has passed. Not sure what official things need to happen now....

On May 14, 2017 6:40 PM, "Tim Fennell" notifications@github.com wrote:

@yfarjoun https://github.com/yfarjoun I think I see 👍 from all the existing maintainers, and proposed future maintainers, and most of the folks who joined the conversation too. Are we ready to move forward, or were there any last people you wanted to make sure we heard from?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samtools/htsjdk/issues/871#issuecomment-301344880, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0pwWtYNreR8vFo2PTsXFcmd7k9Fsks5r54LCgaJpZM4NP8B9 .

tfenne commented 7 years ago

Thanks @yfarjoun. I think the only major thing that needs to happen that I cannot make happen is that all three of @jacarey, @lbergelson and myself should have access to edit who has access to the project. This is currently administered through a pair of groups in the samtools org, Java admins and Java developers. It looks like @jacarey is already an "Owner" within samtools, but that @lbergelson and myself are only "Member"s. Can you or @jacarey upgrade @lbergelson and please?

After that I would suggest that:

  1. The three new maintainers have a couple of brief off-line conversations about boot-strapping the governance process
  2. We then put in place a very minimal process for the project, and use that process to reasonably rapidly evolve and flesh out the full set of governance processes for the project.
jacarey commented 7 years ago

@tfenne and @lbergelson are now "Owner" as well.

d-cameron commented 6 years ago

@tfenne has there been any progress on the defining the project governance? I'm particularly interested how the project roadmap (if any) is defined, and processes around accepting PRs from the community that do not have any historical links to the Broad (such as myself).

magicDGS commented 6 years ago

I am actually quite interested on this too, because I am in the same position as @d-cameron. Something like what is happening at the Hadoop-BAM project would be nice, including an online meeting to take some decisions.

I would definetely like to move forward HTSJDK to an interface-based library and with SemVer (from v3 onwards), but this does not look to happen unless the governance process is defined and a roadmap written. For example, long-time standing PRs which might be useful for the community are not reviewed (some of my own examples, tribble writting support in #822 or compressed reference FASTA support in #1014) - even if they do not break compatibility with previous versions (and obviously is not part of v3).

From my point of view, I think that this project needs the same number of Broad and non-Broad maintainers and granted community reviewers for move forward some developments (although ultimate decisions might be taken by maintainers).

droazen commented 6 years ago

Speaking on behalf of the Broad, we've been waiting for some people to free up on our end (in particular, @lbergelson and @cmnbroad) before moving forward with this and starting the process of drafting a roadmap for HTSJDK 3.0 with input from the community. Obviously our participation in this effort has been delayed considerably by the recent GATK 4.0 release and its aftermath. The good news, though, is that @lbergelson is expected to free up sometime in May, and will be able to devote full-time effort to this project for the foreseeable future.

magicDGS commented 6 years ago

@droazen - thanks for the update. It will be great if htsjdk 3 moves forward!