Open glyg opened 2 years ago
Ah, I see. This is the equivalent of ireg
in the iRODS world. Just registering an existing file in-place, with no data movement or additional disk space.
To attempt to answer your questions...
How would the mount points be precisely managed? Should there be one mount point per user? Can this be automated?
A single NFSRODS mountpoint can handle multiple users without trouble. The linux user who steps into the mountpoint must also exist in the iRODS namespace, and the NFSv4 ACLs are honored/mapped to iRODS permissions.
So, more investigation / discussion is needed to understand the 'normal' workflow of an inplace OMERO import - and whether any user can perform this, or the admin can perform this on behalf of a user, or whether it can be automated with a button in the OMERO interface in some way...
On the OMERO server, the
/DATA
directory needs to be accessible as read only by theomero-server
user — this is manageable through group permission, right?
I think so - we should be able to manage permissions across the mountpoint as desired. It's not yet clear to me what that desire is, exactly. That is the discussion to be had it seems.
Only the
omero-server
user accesses both/DATA
and/OMERO
on the OMERO server, do we need to have the other ones (user1, user2, etc) created there?
I think this is part of the same discussion. The omero-server
linux account must have a one-to-one account in the iRODS namespace. If no other linux users are walking into the mountpoint, then no - we should be good without creating all the other users. But, we haven't written down any use cases yet - that appears to be the next step(s).
H @trel thank you for the detailed answer.
So to be more precise on the import inplace
workflow:
"user 1" creates a dataset at the microscope, Usually user authentication at the microscope isn't related to omero and irods.
the user logs into an iRODS client as user1
to upload the dataset (maybe through an NFS share on the microscope machine?)
The dataset is ingested (?) by iRODS, which owns a local disk space acting as a buffer and is transported to the datacenter. In iRODS, it is stored at /omeroZone/home/user1/investigation1/study1/dataset0
Once upload is complete, the user can trigger the import of its data to OMERO (either through some UI or automatically based on metadata). The same credentials are used to log into OMERO and see the dataset there.
This latter step is done by a script on the omero server machine. A single linux user (omero-server
) of this machine performs the import inplace. This linux user logs into OMERO as an admin user and performs the import on behalf of the OMERO user user1
. For that, /omeroZone/home/user1/
must be mounted on the omero server and accessible in read only by omero-server
So to summarize:
omero-server
linux account on the omero server machine has also an account on iRODS to use an nfsrods mount as /OMERO
(the part you already demonstrated)I hope it is a bit clearer this way,
Thanks again!
Yes, this seems very reasonable. Just have to grant omero-server
iRODS user read access to the others users' iRODS collections (which could also be granted by omero-server
itself, since it's a rodsadmin
).
This should prove a very useful pattern if we can iron out any surprises.
--
One note... if the microscope user has access to the NFSRODS mountpoint... saving files 'in' there is already 'in' iRODS... physically located over in the datacenter. If so, then steps 2 and 3 above are a single step.
Thanks, I'll try to advance a docker-compose version of that.
One note... if the microscope user has access to the NFSRODS mountpoint... saving files 'in' there is already 'in' iRODS... physically located over in the datacenter. If so, then steps 2 and 3 above are a single step.
Yes this would be great. Am I right in assuming it is not too complicated to have a NFS drive mounted on a Windows OS, that would require authentication to mount, so we are in the correct user space from the get go?
Another question: When importing "inplace" in OMERO it is possible to create hard links instead of symbolic ones. I saw issues on nfsrods mentioning hard links, is it possible? I am not sure it is desirable, but adds a level of redundancy as deleting in /DATA
would not result in a deletion in /OMERO
.
The design of NFSRODS is that it is NFSv4.1, so I believe mounting on Windows should work as expected. I'm not sure the extent this has been tested in the wild, as of yet. Please let us know if it is not behaving for some use case(s).
The hard links in iRODS effort has slowed - there were a few edge cases that were not yet resolved in the design... we should probably document those a bit better... https://github.com/irods/irods_rule_engine_plugin_hard_links
I'm sure about the implications of the symlink vs hardlink distinction... in our case, the entire 'inplace' is already a kind of link-only approach... but yes, we should experiment a bit and see what is best.
Hi @trel I hit a problem today:
It does not seem possible to create a symbolic link on the mounted NFS drive and as a consequence omero import inplace --transfer=ln_s
fails.
To reproduce, I guess the original setup here is sufficient. Bellow, the /OMERO
directory on the omero-server
machine is the nfsrods mount
docker exec -it -u omero-server omeroserver bash
bash-4.2$
bash-4.2$
bash-4.2$
bash-4.2$ pwd
/opt/setup
bash-4.2$ cd /OMERO/
bash-4.2$ ls
BioFormatsCache FullText ManagedRepository certs
bash-4.2$ touch test
bash-4.2$ ln -s test test2
ln: failed to create symbolic link 'test2': Remote I/O error
Here is the volume definition in docker-compose.yml
omero_rods:
name: omero_rods
driver: local
driver_opts:
type: nfs
o: "addr=$NFSRODS_IPADDRESS,rw,noatime,tcp,timeo=14,nolock,soft,rw,nfsvers=4"
device: ":/home/omero-server"
It is declared like so:
omeroserver:
# This container uses the tag for the latest server release of OMERO 5
# To upgrade to the next major release, increment the major version number
image: "openmicroscopy/omero-server:5"
container_name: omeroserver
restart: unless-stopped
env_file:
- .env
networks:
- omero
ports:
- "4063:4063"
- "4064:4064"
volumes:
- type: volume
source: omero_rods
target: /OMERO
volume:
nocopy: true
Reading around, I do not think it is an intrinsic issue with NFS, but am not sure at all.
You can look at the whole project here:
https://gitlab.in2p3.fr/fbi-data/fbi-omero/-/blob/combo
Do you know if this a configuration problem or something else?
Thanks a lot for any hint,
Best
Guillaume
NFSRODS does not support symbolic links. Are symlinks a requirement for omero import inplace
to work?
Thanks for the rapid answer @korydraughn .
Inplace import can use either symbolic or hard links. The alternative is to perform a copy (with or without subsequent deletion of the original file).
see: https://omero.readthedocs.io/en/stable/sysadmins/in-place-import.html#getting-started
A work around my colleague just mentioned is to write the symlinks target in a 'normal' hard drive, although now the "direct import" route would need to be managed differently. I'm testing this now.
Eventually, it would maybe make sense to write a dedicated importer, class, as I think @joshmoore mentioned, and use:
$ CLASSPATH=mycode.jar ./importer-cli --transfer=com.example.MyTransfer baz.tiff
Allright, so having /OMERO
as a "native" drive on the omero server and linking from an NFSRODS mounted /DATA/
works, the docker-compose has been updated.
Now, in order to get closer to the sketch at the top of this thread, one needs to move 'directly imported' files from the native drive to the NFSRODS drive. I see how to do this with a python script listening to inotify
but it would surely be better if iRODS was taking care of that, no?
For now, I am focusing on a multiple users scenario
Hi, Quick update and a question,
Provided the omero-server
user has the correct acls, import inplace
of any user's data seems to be working fine, though a lot of hardening / tests and so on are missing obviously — see https://gitlab.in2p3.fr/fbi-data/fbi-omero/-/blob/combo/scripts/ldap_irods_sync.py :sparkles:
When a new user is created in iRODS, the directory tree in /DATA/
, i.e. the NFSRODS volume exposing /tempZone/
for the omeroserver container, does not show the new directory.
It appears if I restart the container, but this is not super elegant. I guess mounting / un-mounting the volume in the container would also do the trick, but is there another way to update the NFS mount with the new directory structure in the zone?
Best
Guillaume
When a new user is created in iRODS ... the directory tree ... does not show the new directory
I believe if the client side just checks/stats/lists again, it would be there?
@korydraughn this would be client driven, right?
Yes. It is client driven. However, I'd expect any system accessing the mount point to cause an update to the tree. Definitely something we need to confirm.
Thanks for the feedback — I only tried ls /DATA/
, is there something else I should try? Could this be docker specific?
I can open an issue on the nfsrods repo if this is more efficient
Try listing the directory of interest directly and seeing if NFSRODS reports it.
I wonder if iRODS is updating the mtime of the parent collection when a collection is created inside of it. If not, then that is likely part of the problem here.
Oh OK I get it, I'll try this tomorrow, I am away from the machine today
Hi, you were right, I can ls
the newly created collection directly
Most excellent.
Hi @trel, I have some difficulties using a derived version of the docker-compose you provided. I changed the Dockerfile so that the catalog provider is setup with a json configuration file.
Bellow is entrypoint.sh
Here is server_config.json
template (filled through ansible)
For completeness docker-compose.yml
Here is the error I get from up.sh:
./up.sh
irods-catalog uses an image, skipping
irods-client-nfsrods uses an image, skipping
Building irods-catalog-provider
[+] Building 0.8s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.47kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:20.04 0.7s
=> [1/9] FROM docker.io/library/ubuntu:20.04@sha256:db8bf6f4fb351aa7a26e27ba2686cf35a6a409f65603e59d4c203e58387 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 164B 0.0s
=> CACHED [2/9] RUN apt-get update && apt-get install -y apt-transport-https gnupg 0.0s
=> CACHED [3/9] RUN wget -qO - https://packages.irods.org/irods-signing-key.asc | apt-key add - && echo "de 0.0s
=> CACHED [4/9] RUN apt-get update && apt-get install -y libcurl4-gnutls-dev python3 0.0s
=> CACHED [5/9] RUN apt-get update && apt-get install -y irods-database-plugin-postgres=4.3.0-1~foc 0.0s
=> CACHED [6/9] COPY irods_environment.json /root/.irods/irods_environment.json 0.0s
=> CACHED [7/9] COPY irods_environment_server.json /var/lib/irods/.irods/irods_environment.json 0.0s
=> CACHED [8/9] COPY server_config.json /server_config.json 0.0s
=> CACHED [9/9] COPY entrypoint.sh / 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:4867fd861c5f6da99c77672c4771a0241cae5bc24bdc9eeaa141f8e1395960d5 0.0s
=> => naming to docker.io/library/compose_irods-catalog-provider 0.0s
irods-catalog is up-to-date
Starting irods-catalog-provider ... done
Error occurred while authenticating user [rods] [CAT_INVALID_AUTHENTICATION: failed to perform request
] [ec=-826000] failed with error -826000 CAT_INVALID_AUTHENTICATION
irods-catalog is up-to-date
irods-catalog-provider is up-to-date
Starting irods-client-nfsrods ... done
The irods server is up:
irods@irods-catalog-provider:~$ ./irodsctl -v --test --stdout status
Calling status on IrodsController
irodsServer :
Process 84
Process 85
Process 1690
There are some CLOSE_WAIT
processes and the server fails to gracefully shutdown.
root@irods-catalog-provider:/# ss -t
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 0 172.26.0.3:38084 172.26.0.2:postgresql
ESTAB 0 0 172.26.0.3:38074 172.26.0.2:postgresql
CLOSE-WAIT 1 0 172.26.0.3:1247 172.26.0.3:49030
ESTAB 0 0 172.26.0.3:49016 172.26.0.3:1247
ESTAB 0 0 172.26.0.3:1247 172.26.0.3:49016
CLOSE-WAIT 1 0 172.26.0.3:1247 172.26.0.3:47680
Any hint on how to debug this greatly appreciated.
Hi,
Thank you for the
sandbox
example, which runs smoothly! I have a complementary use-case:Importing data in OMERO can be done inplace meaning that the
ManagedRepository
directory only contains a symbolic link (or a hard link but I'm not sure it's possible here) to the imported data. We plan to use that feature and use iRODS to transport data from the microscope to the data center and import it only then into OMERO without extra copies.While I think this looks feasible, I have questions:
/DATA
directory needs to be accessible as read only by theomero-server
user — this is manageable through group permission, right?omero-server
user accesses both/DATA
and/OMERO
on the OMERO server, do we need to have the other ones (user1, user2, etc) created there?I am a complete newbie with iRODS, so a lot of all that is a bit obscure, sorry if I am missing obvious stuff, and I am happy to clarify if my sketch is too obscure!
Best
Guillaume