What you need to set up OMERO training

ome / omero-guides

Guides for using OMERO

https://omero-guides.readthedocs.io

BSD 2-Clause "Simplified" License

1 stars 5 forks source link

What you need to set up OMERO training #107

Open pwalczysko opened 3 years ago

pwalczysko commented 3 years ago

Design issue with the goal to come up with a walkthrough for an external (non-OME-Team) scientist(s)/facility, which should help them to set up a training on OMERO on their site, but without direct participation of the trainers from OME Team.

Please see attached pdf 2021-04-14-training-walkthrough.pdf for the item-by-item walkthrough - this design issue is a work in progress, and new pdfs will come as we are going further with the prep of this walkthrough.

The outline of the walkthrough points:

Which server to use? Can we use OME Team's servers?
My local production server usage for trainings- pitfalls and advantages
Ansible playbook - recipe for setting up a training server. Take care about Apps installations if not using playbook
Group/User setup
- Own Production server - limited options
- Dedicated training server - use script provided by OME Team
- Create enough user accounts, but remember, user cannot be deleted in OMERO
- Set up groups to showcase cooperation in OMERO
Which data should we use?
- Publicly available
- IDR data
- downloads page data from OME
- Should have enough of modalities, multi-z and multi-t
How should I import image data ahead of the session
- In-place import
- Import for many users in batch manner
- Use Import As workflow or script provided by the OME Team
Annotate your images: secondary metadata
- Use script for Key-Value pairs, tags, ratings (some scripts provided by OME Team available)
- Import metadata from CSV as OMERO.tables using CLI
- Create Key-Value pairs from OMERO.tables using CLI
- Import metadata from CSV using UI script
- Create analysis metadata tables or CSVs for use in OMERO.parade
Teach how to view images in OMERO
- How to draw ROIs in OMERO.iviewer
- How to lock shape on specific t or z
OMERO.figure
- Understand that this is a must to show to your users
- OMERO.figure is an App, needs to be installed explicitly
- Teach scalebar with option to add calibration post-hoc
- Teach crop workflow (can make problems if not explained)
Analysis
- Basic 3rd party tool workflow involves Fiji, teach how to install the plugin, explain potential problems
- Jupyter notebooks provided by OME Team:
  - Understand their usage and installation as per omero-guides
  - Understand their purpose as a teaching material but also as examples of 3rd party software connetion possibilities
  - Run the notebooks in MyBinder or BinderHub if available
  - Run the notebooks in Google colab (python only)
  - Understand how to run the notebooks locally in your facility
- How to set up a conda environment for image analysis with OMERO
User administration
- Explain permissions to user via web UI
- Explain the concept of a default group
- If you have LDAP in your institution with a default private group, teach users to become members of their lab group in OMERO asap
- teach the webadmin (user interface for managing groups and users in OMERO.web) (not for basic users)
- understand the CLI possiblities for user administration (not for basic users)
- understand the basic restricted admin concept in OMERO (not for basic users)
Getting started page
Generic walkthroughs for types of users, what we have in omero-guides already
- Basic user (as omero-guide)
- Facility manager (as omero-guide)
  - What else is needed ?
    - Do a (basic) training walkthrough ? (to be added as omero-guide walkthrough) -> Done now.
    - A walkthrough of how do Command line interface (CLI) and OMERO ?

LIST OF VERSIONS of DETAILED WALKTHROUGH (to be expanded as new content comes)

2021-04-14-training-walkthrough.pdf

cc @jburel @juliomateoslangerak @sukunis @erickmartins

erickmartins commented 3 years ago

I'll add a few thoughts:

1) We've been playing with the idea of a self-contained "omero-training" docker-compose that we can just spin up in the cloud that would already include a preset database with users, imported data and so on, so I guess an ansible playbook with these features would fulfill a similar need. Ideally, I would love to have an easily-deployable "OME-default OMERO training server" that could run in the cloud, for example.

2) Our users/groups approach was to use OME scripts to create 50 training users in a "training" group, and between training sessions we "clean up" and undo most of the changes that happen during a training session. We're already using OME scripts to do most of this - I think currently the main manual steps are deleting tags and attachments IIRC. 3) To avoid data multiplication we in-place import the same data to every user, which is good, but people still get confused when trying to look at each other's data: they see the same data again. Maybe it might be worth in-place importing the same data, but changing names ("user1_randomimage.tif" instead of "randomimage.tif"). Something for the future!

3) The Crop workflow under OMERO.figure is the kind of functionality I only go into if I have a lot of time to explain it in detail. I understand how useful it can be, but in general for a training session that is limited in time I stick to zoom/pan of multiple panels and don't touch it at all. Once in a while someone will come to me with a "can I do X in OMERO.Figure" question and I'm more than happy to explain that when needed outside the training workshop context.

4) For Python+OMERO, we've moved to using a lot of our own convenience functions in training. Last time we ran an "OMERO for developers" workshop, we spun up a JupyterHub in GCP and cloned our repo with notebooks (https://github.com/TheJacksonLaboratory/omero_for_developers) directly there. Because our GCP instance has VPN access, we could still do work in our internal dev server.

juliomateoslangerak commented 3 years ago

Some ideas too We woudl also love some docker (or ansyble). It would be nice if somehow the groups/users would be configurable through some kind of yml or toml file. Actually for our training I would like to create more meaningful users based on known French Comics. That would not only add a funny side, but also, users will identify more clearly some of the concepts. If configurable, people may use lists of known groups that are known locally. You may use pictures and demo that feature (I do not think we would run into copyright issues if it is us grasping the image from a URL or so). e.g.

- group: petit village gaulois
  - owner: Abraracourcix
  - users: Astérix, Obélix, Idéfix, Panoramix, Assurancetourix, Ordralfabétix,...
- group: Tournesol Lab
  - owner: Professor Calculus
  - users: Tintin, Captain Haddock, Bianca Castafiore, Nestor, Dupond, Dupont, Milou ...

I agree on the advantages and disadvantages of using the same data. One way around might be that all data are the same but one dataset used to demo the sharing where, each user has a "personal" image: a pencil, a lab timer, a lab book, a coffee mug,...

joshmoore commented 3 years ago

Two thoughts: YML could make good sense, e.g. in ansible, to synchronize the user/group setup, perhaps as a new CLI plugin?; however, at the moment the closest thing that exists would likely be LDIF --> https://github.com/ome/apacheds-docker

juliomateoslangerak commented 3 years ago

Right, but this is to really manage the acounts... no? I was thinking of something to create a training server that is going to be trashed a few days after the training. The YML could also specify / customize the data to import and different kinds of annotations.

joshmoore commented 3 years ago

I guess I would urge us to make elements which are composable so that others benefit. e.g. anything that is a CLI plugin can be consumed pretty easily from ansible, etc.

sbesson commented 3 years ago

Across OME resources, two places managing user/group creation (differently) are:

https://github.com/ome/training-scripts/blob/master/maintenance/scripts/create_groups_users_setup which effectively loops over CLI commands
https://github.com/ome/ansible-role-omero-user which proposed a YAML-based representation similar to the one above

Adding my 👍 to the idea of unifying these efforts and converging towards:

a serialized representation of user/group
a new API/CLI plugin that can consume this format
a reduction of the hard-coded logic in the script/roles above to consume the new API

pwalczysko commented 3 years ago

@erickmartins @juliomateoslangerak Thank you for your very valuable thoughts and pointers to the teaching material.

Re: ...have an easily-deployable "OME-default OMERO training server" ... and "...thinking of something to create a training server that is going to be trashed a few days after the training...."

The main thing which in our experience is to be thought through here is whether or not this server contains any image data, or how does it connect or import such after it has been spun up. In our hands, the data import is the main burden when setting up a training server, and thus we do not (as I understand @erickmartins does not either) trash our training servers, prefering to cleani them instead between trainings. Another thing to consider is also that your trainees will not like too much that their training environment is trashed quickly after the training.

@erickmartins

I think currently the main manual steps are deleting tags and attachments IIRC.

Thank you for emphasizing the importance of cleanup. I will try to come up with a cleanup section in the next version of the walkthrough and attach it here. Briefly, some cleanup scripts concerning tags and attachments (file attachments and MapAnnotations) cleanup are available already, let me try to come up with their description in the walkthrough. You will of course need some adjustments to make them work in your hands/setup, but they give a good lead.

@juliomateoslangerak

Re: would be nice if somehow the groups/users would be configurable...

Additionally to the comments and clarifications of @joshmoore and @sbesson , which I agree with, one thing to consider here is how to match the flexible userbase on your servers with any preimported data in a ready-made docker-compose OME training server or similar. This means, rather than creating the new users from scratch every time (such users would by definition have no data to start with), the suggested functionality should rather keep the userbase the same, and manipulate only the naming of them.In our experience, the image data and their quality have much more impact on the participant satisfaction than the user/group setup. Of course, a docker-compose or otherwise pre-fabricated OME training server is a good idea even without any data, but it has some value to see the bigger picture here as well, as it might help to spare resources. We have a "renaming script" for the usernames, and it would be easy to extend the renaming to the groups too. The script uses names of famous scientists. @juliomateoslangerak You can insert "Asterix" and similar to the script, but one, at first look possibly less important remark is that every user in OMERO must have a first name and last name, which Asterix does not (I like Asterix a lot too though 😄 ). The solution of having first name: Asterix and a last name: Asterix does not come accross very well for the participants (displayed in OMERO.web as Asterix Asterix, which looks like a bug of the software - OME Team worked previously for a long time with user-1 user-1 type of names in the trainings, so this remark comes from a hard-won experience 😄

@erickmartins

Re your points 3., 4 and 5.

Thank you for these, they concern the naming of images, a balance between longer and shorter workshop and how to prioritize shown features, and teaching of OMERO API materials (thank you very much for this !). I will try to integrate these points into the next version of the workflow.

juliomateoslangerak commented 3 years ago

Indeed naming users as I suggested will look odd. I remember now being 'disturbed' by this user name duplication in the past. For the other comments I'm not sure I was clear or don't understand what you mean (or both). What I guess I would like, in order to setup a training server on a regular basis, under the idea that the training server is trashed a few days after the training (my assumption is that the users will further experiment on the production server)

a docker-compose to setup the server and eventually run a setup script
a setup script to copy a number of projects from user from a given server into new users data. This script would consume a configuration file
a config file to configure groups, users, user names, the source host and projects,...

Does this make sense to you? To me, the great advantage would be that the trainers would be able to easily setup the training environment from the convenience of their regular daily server. I looked carefully into the maintenance scripts you provide and it seems that more or less everything is in there and actually many of the scripts are doing this.

pwalczysko commented 3 years ago

@juliomateoslangerak Thank you for this. Yes, your idea of docker-compose is good. What I wanted to point out (sorry for this being unclear, as I tried to answer too many questions):

The OME Team is not handling the training servers the way you are proposing.
The reason for that are a) The import of image data. This part is the most time and resources consuming in a setup of such server. The reasons for it are: There is no script which copies the image data from one OMERO.server to another. Such data transfer between OMERO servers is an epic which is far from solved. Even if there was such a script, it would have to work via exporting and reimporting the data into the new OMERO server. Depending on your data volumes, and especially if you want each user to have their own data, this will take very long time, and would need storage capacity, unless the data are imported in-place. But if the data should be imported in-place, this would have to be done inside the machine where the OMERO.server is installed on, which adds complexity to the setup. b) The scripts in the https://github.com/ome/training-scripts/tree/master/maintenance which you mention are indeed dealing with such setups, but these scripts are run one-by-one at such times as they are needed. There is still a long way from here to combining the scripts into one master script which would consime a setup file as you suggest. To achieve that would be a considerable body of work, and the OME Team would have to balance the time and effort investment needed for that with other tasks, and see if the feature is likely to be used by many other users, see comments of @sbesson and @joshmoore above.

from the convenience of their regular daily server

Maybe I do not completely understand this point. Do you maybe mean that the data are in-place imported in the "regular daily server" and all is needed is to in-place import them into the "newly born" training server ? Please expand on this point.

As indicated, let us discuss further, there will be certainly adjustments and compromises needed. Thanks a lot.

joshmoore commented 3 years ago

a docker-compose to setup the server and eventually run a setup script

Certainly doable. It may be as easy as either (1) running the training playbook within docker or (2) running it against docker.

a setup script to copy a number of projects from user from a given server into new users data. This script would consume a configuration file

As @pwalczysko points out, this is the hard one. But certainly what we're all working towards. Until then, you'll need to find a version that works in your setup.

a config file to configure groups, users, user names, the source host and projects,...

Especially if you are doing all of this in Docker, you might think about adding in the openmicroscopy/apacheds docker and just save an image with all of the users you want.

juliomateoslangerak commented 3 years ago

Hi thanks everyone for the advice. @pwalczysko perfectly logic that you don't want to hack such a script to deal with all possible scenarios. I think, for us, and I have to double check with Christophe Blanchet (the responsible of the IFB infrastructure that is going to host our training server), the space limitations is not going to be a problem, given that it is a training server for not a lot of people (not 50 anyway) and for a 2 days period, I worked/hacked in a script that is duplicating a list of projects for every user. It is so far only duplicating the projects, datasets and images. No other annotations yet, but I will work on that for a second training. I think for this one manual annotation will be faster. We need this for next week!! No in-place import yet. One little question there: if I user in-place import and a user deletes an image-original file, will OMERO delete it or keep it as long as there is another reference to it in the database...? https://github.com/juliomateoslangerak/training-scripts/blob/master/maintenance/scripts/copy_projects.py @pwalczysko When I talk about the convenience of our daily server is the following. I setup a group in our prod server where I keep, together with other trainers, the data that is going to be used in the training, as it is going to be used in the training. Nicely organized, annotated,... When come across data that is interesting for the training, so as we do for other purposes such as scientific presentations, examples, etc,..., we can add them to that group too. Well the day before the presentation, we create a server, we launch the script and we get the same 'environment' for each trainee.

I do not know what you think but, when moving to more basic trainings, these might facilitate core-facility engineers without a deeper knowledge of OMERO or the tools to configure a training server, to setup basic training sessions.

pwalczysko commented 3 years ago

@juliomateoslangerak thank you for this this is really helpful.

I worked/hacked in a script that is duplicating a list of projects for every user. It is so far only duplicating the projects, datasets and images. No other annotations yet, but I will work on that for a second training. I think for this one manual annotation will be faster. We need this for next week!!

Great, thank you for that script. I will try to test it myself, time permitting. I wonder how it willl perform with the download and reimport of the image data, and possibly have a question about how the https://github.com/juliomateoslangerak/training-scripts/blob/master/maintenance/scripts/copy_projects.py#L49 bit would work for more complex file formats such as .vsi. Please let us know how the script performed on your side.

One little question there: if I user in-place import and a user deletes an image-original file, will OMERO delete it or keep it as long as there is another reference to it in the database...?

Thank you, this is a not so little question. The clear answer is: OMERO will never delete original files imported in-place. It does not matter into how many OMERO servers you imported these files in-place into. The supposition which I am making in this statemnt is that indeed these files are in a safe, read-only place (before and after any import to OMERO), this means not in OMERO's Managed Repository. The files being in a safe, read-only place is a recommended and standard situation for in-place imports. All what OMERO is deleting when you are deleting the images from within OMERO are the links to your original files, which, as we established, live in your "in-place", i.e. read-only place. This setup is used also for the training servers of the OME Team, Also note that in order to import in-place, you need to do it from the environment of the machine where that particular OMERO.server is running on. In your case, I am guessing that the production, "regular daily" server data are not imported in-place, but using the classical import route. In such case, what I wrote above is not fully valid. The files are sitting inside the Managed Repository of your production server. The production OMERO server, if the deletion would be executed from it, would delete these original files from its Managed Repository too. We can expand and discuss this situation too, but maybe please specify your setup first.

@pwalczysko When I talk about the convenience of our daily server is the following. I setup a group in our prod server where I keep, together with other trainers, the data that is going to be used in the training, as it is going to be used in the training. Nicely organized, annotated,... When come across data that is interesting for the training, so as we do for other purposes such as scientific presentations, examples, etc,..., we can add them to that group too. Well the day before the presentation, we create a server, we launch the script and we get the same 'environment' for each trainee.

Thank you for clarifying that, this makes perfect sense. This is actually similar to the way we (OME Team) apprache our training servers, with the exception that the "nice things which we want to use in next training" are accrued and kept on the training servers themselves. Nevertheless, lot of "nice things" are kept in the IDR too, and we do have some features of what you are drafting here, our "production server", ie. "regular daily server" being the IDR.