teamtomo / membrain-seg

membrane segmentation in 3D for cryo-ET
Other
47 stars 12 forks source link

moving to package independent of membrain #58

Open alisterburt opened 4 months ago

alisterburt commented 4 months ago

@LorenzLamm opening an issue to track the conversion of this package into a standalone teamtomo tool that is separate from MemBrain and called by MemBrain.

We discussed reasons for this and I think we ended in agreement, any concerns about any of this before we move ahead?

Checklist for things that need to happen

Do you want help with any of these steps in particular? I'd love to find time to pair on anything I can help with

LorenzLamm commented 4 months ago

Hey @alisterburt,

Thanks a lot for pushing this and creating all the issues!

I'll respond in more detail in the respective issues, and I agree with most of your points. Honestly, I'm quite buried in work at the moment, so I need to figure out what to best spend time on.

For me, the highest priority is to maintain / improve the performance of MemBrain-seg and maintain / improve easy access to it. Therefore, I agree that it's important to make the weights & training data available and get the documentation up-to-date. Also, I think it would be nice to have your points regarding API and on-the-fly preprocessing.

Regarding renaming to something different: In my opinion, it would be nice to keep the repository name and usability as it is: People already use it for segmenting the membranes and I also like the name membrain-seg, as it does belong to the MemBrain universe.

I think it would be rather nice to maintain the functionality of the repository as it is, and modularize it and outsource different functionalities. E.g., as we discussed, we might have a separate package for data augmentations, and outsource the preprocessing to libtilt?

alisterburt commented 4 months ago

Hey @LorenzLamm

Honestly, I'm quite buried in work at the moment, so I need to figure out what to best spend time on.

I understand and don't take any of these issues as specific pressure on you - a lot of the goal here is to get the package to a point where others (e.g. me, @rdrighetto, @kevinyamauchi, @kephale ) can super easily jump in at any point and help to maintain/improve/update the package πŸ™‚

For me, the highest priority is to maintain / improve the performance of MemBrain-seg and maintain / improve easy access to it.

Totally - getting the training data out in the open will be a bit one for this, if any of us finds a tomogram for which the package isn't performing well we can annotate ourselves and add it to the dataset on zenodo. The zenodo community isn't quite full and direct community editing but it's a start at least!

Regarding renaming to something different: In my opinion, it would be nice to keep the repository name and usability as it is: People already use it for segmenting the membranes and I also like the name membrain-seg, as it does belong to the MemBrain universe.

Hm, I thought on our last call we had ended up on the same page here... what we discussed was:

The exact order of operations here is unimportant and I totally understand that you would like to keep funneling users through the membrain universe and building that brand - I'm aware of the importance that you get appropriate recognition for your work and am explicitly not trying to take any of that away :-) this is alluded to on the teamtomo org page

Development model
Because packages in teamtomo are small and well scoped it is easy to depend on only those packages relevant to your work.

teamtomo is not a place for the development of your own software projects. Your projects belong to you and the credit should be yours alone! Did you develop a small, reusable component along the way that could benefit the entire community? We would love to work with you to bring it here.

What I am concerned about is trying to make sure that we are building things in a way that the community feels empowered to depend on them and maintain them moving forwards. For me, this means small, well scoped packages which are independent of any one labs 'brand'. I think that tightly coupling the small, useful components we build to our larger, user facing, 'branded' academic projects inherently makes others feel like it is not their responsibility to help to maintain this nice infrastructure, instead we think 'oh, we need to ask Lorenz to fix membrain-seg', 'sjors needs to fix RELION', 'the cryosparc team need to fix my bug'. I really believe that this separation is important for helping us to build a healthy ecosystem where we can benefit from each others work.

I'm really interested in hearing what the sticking point is for you with the context of everything I've said above - I am trying to figure out how we as a community can best do scientific software development and this is a perfect testbed, I want to understand if/why this doesn't work for you. I'm sorry if this request is frustrating and genuinely thank you for your pateince, I hope you now understand better why I've made it.

@rdrighetto I'd also be interested to hear your take on this development philosophy?

kephale commented 4 months ago

Thanks @alisterburt! This is a clear perspective on the goals of teamtomo.

My interpretation is:

These types of community-first goals definitely beg for a governance model.

Totally - getting the training data out in the open will be a bit one for this, if any of us finds a tomogram for which the package isn't performing well we can annotate ourselves and add it to the data. It's not quite full community editing but it's a start at least!

This is definitely the intent behind the CZ CryoET Data Portal. If for any reason you think the plan for the data portal isn't aligning with community needs, then I expect @uermel would be happy to discuss.

alisterburt commented 4 months ago

Thanks @kephale ! Appreciate the comments and I think what you've said matches how I think about the project.

You're totally right about a governance model - I'd love to get to a point where others feel empowered and have a sense of ownership of the overall project. @LorenzLamm @kephale would you be interested in discussing/being part of this governance model?

kephale commented 4 months ago

Sounds great!

I guess this discussion should move to somewhere else (TeamTomo Zulip?)

rdrighetto commented 4 months ago

Hi @alisterburt, thank you so much for raising all these important points!

I agree that the way things are going, MemBrain-seg has sort of "outgrown" teamtomo, or at least, is now beyond the scope of this organization (to use the Github term), exactly as you state here:

MemBrain thus becomes is it's own package which doesn't live in the teamtomo github org. MemBrain implements the CLI membrain with the subcommand segment, in this way the API is maintained entirely and users change nothing, except it becomes pip install membrain . Other subcommands stats and pick are also implemented there and you maintain complete control, everything user facing goes through you/your group/the membrain universe

I think that was always the plan, but it seems not is the time to move the membrain-seg project to our lab's GitHub, where the old but main MemBrain project already lives, until both are merged eventually: https://github.com/CellArchLab/MemBrain (I'm not sure what is the best strategy for moving the project to a new home on Github, also in a way that doesn't break the membrain-seg pip package, any advice on that would be appreciated!)

@LorenzLamm can then continue the development there aiming at making MemBrain v2 the cohesive, user-friendly package with all 3 modules (segment, pick, stats) that we envision. Of course, everyone who has contributed so far and new contributions will always be welcome there! We all need to consider that @LorenzLamm is now facing the end of his PhD and who knows what great adventures await him after that. So his priority at the moment is really to push the "MemBrain universe" as far as possible, catering to our labs own research needs which motivated the creation of MemBrain in the first place, but also making it a useful tool for others whenever possible.

Then we come to this point:

once the new package membrain is in place, we transition here to something membrain-independent so that this small isolated piece is not explicitly linked to membrain, rather it was contributed to the community by membrain's developers and membrain depends on it like any other package would. In the same way that napari was given to the community and membrain will depend on it.

Sounds good and we're cool with it in principle, but I don't know exactly what parts of the membrain-seg "backend" would make sense to be hosted here under teamtomo org. Things like https://github.com/teamtomo/membrain-seg/issues/56 do not sound like a "small isolated piece" but rather membrain-seg itself, just with a different name? Because to achieve what you propose here, a lot goes on in the background, and if you plan to host all that here, then in the end it's just the repo renamed? Please forgive me if I'm missing something.

To summarize, we understand that we should now host this repo under our lab, and modularize the most interesting bits and pieces (like the pre-processing code) to be hosted here and call them in MemBrain(-seg). We just don't know what these pieces are exactly πŸ™ƒ

Thanks for your feedback, happy to discuss more!

LorenzLamm commented 4 months ago

Hey @alisterburt,

Thanks also from my side for your detailed explanations. I agree that the teamtomo Github may not be the ideal place anymore for MemBrain(-seg), as it is a software package in its development phase. So, MemBrain should probably live in the CellArchLab Github.

I want to emphasize that we do have aligned interests here in that we want to make this whole package not only available to use for the community, but also to contribute. Many have already contributed by providing patches for our training dataset, but also for the coding side, everyone should of course be welcome to contribute. I'm not sure if the threshold for contribution is much higher if a repository is associated with a publication, but we definitely want to encourage users to contribute.

However, I feel that a membraneer package that is called by MemBrain-seg can also lead to a lot of confusion and not clearly defined use cases. In my opinion, it would make more sense to leave all the membrane segmentation-specific utilities in the MemBrain-seg package, and potentially outsource e.g. the U-Net training, the data augmentations, or the preprocessing into different modules that can easily be imported by other modules.

Practically, what would you suggest? Would it make sense that we move this entire repository to the CellArchLab Github, and then extract modules out of it?

LorenzLamm commented 4 months ago

Also: There is a #teamtomo Zulip? @rdrighetto and I would be happy to join for further discussion :)

kephale commented 4 months ago

https://teamtomo.zulipchat.com/#narrow/stream/329110-general/topic/bringing.20over.20conversation.20from.20membrain-seg

alisterburt commented 4 months ago

Sorry for the delay responding here, had a great chat offline with @LorenzLamm and @rdrighetto - should have a path forwards for this repo from early next week ☺️