Backup data mounted on galaxy_storage volume not restored.

mhabsaoui commented 7 years ago

Hi, I'm trying to have the backup data exported from galaxy-appliance container into the galaxy_storage volume (folder on host), to be restored again properly after having rebuild the galaxy-appliance container (we need to rebuild because of new NLP Tools added/linked as new Tomcat containers ==> it has the consequence to recreate galaxy container...). But, even if i can access again the galaxy welcome page, it doesn't load the old user data (logging, uploaded datasets in histories,...).

PS: I've tried the steps of rebuilding the original "bgruening/galaxy-stable" docker image (http://bgruening.github.io/docker-galaxy-stable/usage.html) ==> it loads correctly the user data back :) But not working on Lapps-galaxy :/

Thanks for support.

bgruening commented 7 years ago

Mh ... I'm not sure I understand everything. But such an /export folder should work in any Galaxy instance, as long as the Galaxy version stays the same and you have not changes something in the startup-magic.

Can it be that this is a bug in the code you have added?

mhabsaoui commented 7 years ago

Well i'm going to tell my problem more clearly :

When testing the original galaxy-docker image (run the galaxy container, then remove it, and finally re-run it again), with the /export/ bind ==> the galaxy instance restores correctly the old data from the /galaxy_storage/ mounted volume on host.
But, when doing the same here with the galaxy-appliance container (run, remove and run again), it doesn't restore the old data from host volume. In fact, the old data are still there on host volume (in postgres, database directories,...etc), but simply got ignored by the new galaxy-appliance container instance :/

So i'm trying to understand why it fails on data restoring when the galaxy-appliance container is removed+recreated. Maybe it has to do with galaxy instance's crypto/id or something, or code that's overriding the startup-magic...

PS: By the way, if the galaxy-appliance container is just stopped+re-run (not removed or recreated), in this case only it restores data correctly).

Hope it's clearer now :)

Thanks for help.

ksuderman commented 7 years ago

How are you starting the appliance? I'm not sure if the docker-compose.yml file that is generated by default mounts the /export volume for Galaxy, but I will need to double check.

mhabsaoui commented 7 years ago

Yep, the docker-compose.yml file that is generated by default has no mount for the /export volume on Galaxy service.

I can confirm that the /export/ directory (made of multiple symlinks inside the galaxy container : postgres, galaxy-central, database,var...) is correctly mounted on host ==> to make it i have just added this to the galaxy service in the docker-compose.yml (of course we can add this also to the YamlBuilder.groovy file for generating with make-appliance) :

volumes:
    - /home/user/data/galaxy_storage/:/export/

Notes:

How are you starting the appliance?

==> 1) ./make-appliance masc oaqa stanford gate ........
2) docker-compose up --build

but I will need to double check

==> you will have to uncomment the 80 line in the file galaxy/startup.sh (postgres not activated) to make the galaxy instance (python web-app) work ;)

Thanks again 👍

ksuderman commented 7 years ago

I have added a parameter (-e | --export) to the YamlBuilder.groovy script so a directory to mount as the /export volume can be specified on the command line. E.g.

groovy YamlBuilder.groovy --export /home/user/galaxy/data lappsgrid gate masc  ...

Currently this modification only exists in the develop branch.

mhabsaoui commented 7 years ago

Nice, i'l give it a try an tell you what. And i'll wait for your checkups and tests 👍

I will also try to let Docker put the mounted volume automatically on host ( volume: - /export/ ) to see if it has any effect...

PS: What do you think of having an improved method of docker-compose.yaml generating ? I mean just like we do it through npm CLI ==> a script that automates the creation of a new NLP-Tool : adds a Template structure (Tools directories, groovy files, galaxy wrappers files,...) with all its required bits to make it a new container linked to galaxy one and ready to be run ;)

ksuderman commented 7 years ago

I think that would be a great idea and it is our ultimate goal. I just haven't had the time to work on it. Pull requests welcome :)

The trickiest part is generating the Galaxy wrapper files.

bgruening commented 7 years ago

Because I read so often docker-compose, you might want to have a look at the latest developements in the dev branch and especially in the compose folder.

mhabsaoui commented 7 years ago

Very instructive, specially about Advanced section:

postgres intit/updating : very easy to backup/restore database for Admins :)
galaxy-init : Could the following mechanism be a clue to galaxy-appliance not restoring properly ??

When initialization is complete, this container notifies the galaxy handlers to start up by locking /export/.initdone. You can disable this mechanism by setting DISABLE_SLEEPLOCK=true. version.

Exploding the galaxy-docker processes in containers is awsome :) => https://github.com/bgruening/docker-galaxy-stable/tree/dev/compose => https://github.com/bgruening/docker-galaxy-stable/tree/master/compose

mhabsaoui commented 7 years ago

@ksuderman

can you confirm you have the same issue on your side for the master galaxy-appliance ?
Is there a difference if we generate or manually add the "volumes: ....." parameter into docker-compose.yml ?

@bgruening what's the point between the postgres db dump'data (dumpsql.sh) and the /export content'data (script initializes the export directory with symlinks) ?

Thanks.

bgruening commented 7 years ago

@mhabsaoui dumbsql.sh is a script for me. This is only useful to generate new images. The compose setup comes with a clean-SQL dump and the postgresql image uses this during startup, instead of building the entire DB from scratch - this is just faster.

mhabsaoui commented 7 years ago

@ksuderman

i've tested with mounted bind generated by groovy
i've tried to comment the following code (line 34) in galaxy/Dockerfile

# Mark folders as imported from the host.
VOLUME ["/export/", "/data/", "/var/lib/docker"]

i've tried with a data volume (instead a bind volume) as follow: ........ volumes: - galaxy-store:/export volumes: galaxy-store:

But all the same, no BKP data restoring...

All i can say is that the mounted bind/data volume is re-accessed by the new launched instance of galaxy-appliance (with the data volume, i've checked it's not dangling while galaxy is running...). But for some reason, it doesn't load old data :/

Thanks for confirming the same issue, and for your feedbacks.

mhabsaoui commented 7 years ago

@bgruening i managed to modify Dockerfile so that the magic works again for galaxy's instance data restoring ==> simply avoid of redundant startup script, not adding it, so the galaxy'one inherited plays.

But, i get the tool_config.xml getting erased somehow, during galaxy container startup :/

galaxy_1 | Remove all tools from the tool_conf.xml file.

Do you know why is it happening (somewhere in python script) ? Is it possible to avoid this step ?

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 81d08e8db0bf lappsgrid/galaxy-appliance "/usr/bin/startup" 14 minutes ago Up 14 minutes 21/tcp, 443/tcp, 8800/tcp, 9002/tcp, 0.0.0.0:80->80/tcp lappsgalaxy_galaxy_1 50e2fd1122e3 lappsgrid/gate "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8002->8080/tcp lappsgalaxy_gate_1 f3b3205923f5 lappsgrid/legotal "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8003->8080/tcp lappsgalaxy_legotal_1 5b73c4267c11 lappsgrid/dkpro "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8001->8080/tcp lappsgalaxy_dkpro_1 c8c4d31fa74f lappsgrid/opennlp "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8007->8080/tcp lappsgalaxy_opennlp_1 f290d6df3e38 lappsgrid/masc "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8005->8080/tcp lappsgalaxy_masc_1 41d7f8a3011c lappsgrid/stanford "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8008->8080/tcp lappsgalaxy_stanford_1 9efb861b5360 lappsgrid/lingpipe "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8004->8080/tcp lappsgalaxy_lingpipe_1 df2a5de96ffa lappsgrid/oaqa "/usr/bin/startup" 14 minutes ago Up 14 minutes 0.0.0.0:8006->8080/tcp lappsgalaxy_oaqa_1

Thanks.

bgruening commented 7 years ago

@mhabsaoui glad you figured the first mistake out? Just a question to that, this was a local modification from lappsgrid or a general bug?

To your second question, this also does not sound familiar to me and also looks like a lappsgrid modification as they really want to remove all other tools. Imho this should not happen you should simply not load this tool or start the container with the -e BARE=True option which we integrated.

ksuderman commented 7 years ago

Sorry @mhabsaoui this slipped off my radar. I will try to take a look at the problem after the holiday weekend. I do seem to recall a similar problem that was caused by the merged tool_conf.xml file generated by Galaxy not being regenerated properly, but I don't recall the exact details at the moment.

@bgruening we don't actually remove any of the default Galaxy tools other than using -e BARE=True and not including the tools in the tool_conf.xml file.

mhabsaoui commented 7 years ago

Thanks guys for you support !

@bgruening rather local modification. I just removed the startup script provided by lappsgrid...

@ksuderman No problem, i'm aware you got hands on many things. Cool, it'll be nice if it's possible to get this tool_config unwanted modification disabled ASAP...

ksuderman commented 7 years ago

@mhabsaoui I am unable to recreate this problem; if I mount an /export volume for Galaxy data is persisted and available the next time I start the appliance.

However, when I first started an appliance I received the error message:

WARNING: Service "galaxy" is using volume "/export" from the previous container. Host mapping "/tmp/galaxy" has no effect. Remove the existing containers (with docker-compose rm galaxy) to use the host volume mapping.

I removed all containers and that seemed to fix the problem:

docker rm -f $(docker ps -a -q)

I did have a problem with Galaxy displaying the wrong tools in the tools menu, and again this seemed to be the result of stale images being used and purging all the Docker images on my system fixed that problem as well:

docker rmi -f $(docker images -q)
docker-compose up

I've also modified the make-appliance script so that the export directory can be specified when building the appliance. I had previously added this to the YamlBuilder script, but make-appliance was not making use of the parameter so the docker-compose.yml had to be edited manually. I've pushed these modifications to the develop branch.

./make-appliance -e /tmp/galaxy testing masc stanford oaqa

mhabsaoui commented 7 years ago

I've re-tested following your steps : removing all docker images / containers / dangling volumes and re-up all from scratch => registered and played a bit to create have a history with data => stopped appliance with removing all again and rebuild again (having of course our mounted volume 'galaxy-store:/export' on host) => refreshed galaxy but data not reloaded :/

@ksuderman Can you please give us the link to exact branch you clone to test appliance (to be sure we test the same things ).

I will retry to clone from scratch and retest...
retest with the galaxy magic envirt varibale 'NONUSE=nodejs,proftp,reports,slurmd,slurmctld' as well to see if any changes occurs...

but make-appliance was not making use of the parameter so the docker-compose.yml

Yep, i was going to tell you a bit later, sorry ;)

Thanks.

mhabsaoui commented 7 years ago

some feedback :

retest with the galaxy magic envirt varibale 'NONUSE=nodejs,proftp,reports,slurmd,slurmctld' as well to see if any changes occurs ==> Ooops, with this NONUSE we got galaxy exited :/

galaxy_1 | * Starting PostgreSQL 9.3 database server galaxy_1 | ...done. galaxy_1 | tmpfs on /proc/kcore type tmpfs (rw,nosuid,mode=755) galaxy_1 | Disable Galaxy Interactive Environments. Start with --privileged to enable IE's. galaxy_1 | /usr/lib/python2.7/dist-packages/supervisor/options.py:295: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security. galaxy_1 | 'Supervisord is running as root and it is searching ' galaxy_1 | tail: cannot open ‘/home/galaxy/*.log’ for reading: No such file or directory lappsgalaxy_galaxy_1 exited with code 0

test with clone from scratch (no modification) of master branch ==>

==> After launch of 'docker-compose up'

you have to use "make -C ./masc" or modify the 'make-appliance' script a bit to make all tomcat containers get their packages manually... How do you overcome it ??

==> Then, well with all packages in place the appliance is launched ==> But on browser you get the "403 Forbidden nginx/1.4.6 (Ubuntu)" ! How do you overcome it ??

==> Then, i have found that the galaxy 'startup.sh' script has a comment (https://github.com/lappsgrid-incubator/galaxy-appliance/blob/master/galaxy/startup.sh#L80) on the postgres DB startup. How is the galaxy container supposed to store its data... How do you overcome it ??

==> Then, after i've uncommented that line L80 and modified this as following, but still the same : service postgresql start until pg_isready &>/dev/null ; do echo -n "." sleep 2 done

==> looked into the logs (/galaxy-central# vi /home/galaxy/logs/reports.log)

File "/galaxy_venv/local/lib/python2.7/site-packages/psycopg2/init.py", lii ne 164, in connect conn = _connect(dsn, connection_factory=connection_factory, async=async) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect tt o server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432? could not connect to server: Cannot assign requested address Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? Removing PID file /home/galaxy/logs/reports.pid

It seems the Postgres DB is unreachable, whereas it should as we started it :/

==> I restarted the postgres DB manually inside galaxy container / also tried to restart the galaxy container. It seems to work once. But not everytime..

@ksuderman How do you make it ?

Thanks.

ksuderman commented 7 years ago

Use the code from the 'develop' branch to get the latest changes and bugfixes. The master branch won't be updated until the next release.

I've fixed the main appliance Makefile so you can just use make gate rather than make -C gate. I've also added a 'clean' goal to the Makefiles.

Use the -b option to the make-appliance script and all packages/dependencies will be downloaded for the docker images before they are built. The docker-compose command will also build images if they are missing.

I've just tested with a clean build by doing the following:

cd /tmp
git clone https://github.com/lappsgrid-incubator/galaxy-appliance
cd galaxy-appliance
git checkout develop
./make-appliance -b -e /tmp/galaxy-data testing masc oaqa stanford
docker-compose up

# Create an account in Galaxy and login
# Create a pipeline with Stanford tokenizer, tagger, sentence splitter, and NER.
# Run the pipeline on a simple text input file from MASC
# Kill the docker containers (CTL-C in the bash window)

make clean
docker rm -f $(docker ps -aq)
docker rmi -f $(docker images -q)
./make-appliance -b -e /tmp/galaxy-data testing masc oaqa stanford

# Verify the Galaxy account, workflow, and dataset history persisted.

mhabsaoui commented 7 years ago

Hi,

i tested with the 'develop' branch and the galaxy-data restoring works correctly (user logged, histories, workflows... are back) 👍

Tested those 3 ways and it worked on first two : 1) only removed the containers and rebuild => docker-compose down => docker-compose up -- build 2) removed the containers + images and rebuild => docker-compose down => docker rmi -f $(docker images -q) => docker-compose up -- build 3) Same as step 2, except that i've added the lingpipe / opennlp / gate containers to the appliance => ./make-appliance -b -e ../galaxy-data testing masc oaqa stanford lingpipe opennlp gate ==> data restoring is ok, but it seems the tools list isn't updated correctly as you can see below (i had cleared the browser cache...)

I did have a problem with Galaxy displaying the wrong tools in the tools menu, and again this seemed to be the result of stale images being used and purging all the Docker images on my system fixed that problem as well:
docker rmi -f $(docker images -q)
docker-compose up

=> I'm just wondering if this bug you reported is maybe related to the fact that the 'galaxy-data' volume is restored with its previous tools list (e.g. without any newly added tools to appliance). Assuming you didn't deleted any volume(s)...

==> Then, i renamed the previous 'galaxy-data' folder, so that the appliance starts fresh. But, i had this weird OSError as below ...

... and moreover the damn '403' error is back :/

==> Then, i removed all (containers + images + volumes) to start all fresh : even if the 'abnormal termination' error is back, it works again (with our added tools showing right):

=> So, then i retested the data restoring again... and it's working again.

BTW, there are few wrong named services to be corrected (in .xml / lsd file ?)...

==> finally, i wanted to test the restore of the previous 'galaxy-data' volume (the one with only Stanford tool): As before, restoring still works ( i had to reconnect manullay my user account though...). But the Tools list isn't updated.

It's normal if we check the 'galaxy-data' volume ==> more ../galaxy-data/galaxy-central/config/tool_conf.xml

. . . Now the questions are:

It seems that modifying the appliance causes the Tools list not to be updated ?
what was the bug that you got fixed, so that it can also work on 'master' branch ==> to put the galaxy container in privileged mode ?
Is it possible to build the appliance with the galaxy magic envirnment variable 'NONUSE=nodejs,proftp,reports,slurmd,slurmctld' (in order to avoid heavy deployment ) and which processes are necessary to keep on production deploy ?

Thanks.

ksuderman commented 7 years ago

It seems that modifying the appliance causes the Tools list not to be updated ?

This happens when using a new appliance with an existing volume created with a different appliance. The problem is Galaxy is using the tool_conf.xml file it finds in the /export volume and not the newer tool_conf.xml file in the Docker container. I have not been able to finish testing, but you may be able to copy the tool_conf.xml from the appliance build directory to the shared volume.

$ cp build/tool_conf.xml /path/to/volume/galaxy-central/config

what was the bug that you got fixed, so that it can also work on 'master' branch ==> to put the galaxy container in privileged mode ?

Yes, I believe that was the main fix. However, there are several other problems with master; mostly outdate tools.

Is it possible to build the appliance with the galaxy magic envirnment variable 'NONUSE=nodejs,proftp,reports,slurmd,slurmctld' (in order to avoid heavy deployment ) and which processes are necessary to keep on production deploy ?

I will open a new issue for this.

mhabsaoui commented 7 years ago

I have not been able to finish testing, but you may be able to copy the tool_conf.xml from the appliance build directory to the shared volume.

Yep, that's what I already did and it's ok...

Yes, I believe that was the main fix. However, there are several other problems with master; mostly outdate tools.

Nice to have those updates for the good of Lapps. I'll fully test it ASA it's all updated :+1:

I will open a new issue for this.

Just wondering: isn't this feature already existing in galaxy (magic variables), so it should be inherited right ?

ksuderman commented 7 years ago

Nice to have those updates for the good of Lapps. I'll fully test it ASA it's all updated

Yes, we are actually working on the next release now.

Just wondering: isn't this feature already existing in galaxy (magic variables), so it should be inherited right ?

Yes, I just have to expose that option in the make-appliance script.

mhabsaoui commented 7 years ago

I have not been able to finish testing, but you may be able to copy the tool_conf.xml from the appliance build directory to the shared volume.

It could maybe nice to add this manual step right at the end of the 'make-appliance' script, based on the mounted volume's path for Galaxy...

Thanks.

ksuderman commented 7 years ago

Yes, I will definitely look for a better way to have the tool panel updated, however if I do it in the make-appliance script Galaxy will simply overwrite the file when it initially populates the /export directory.

mhabsaoui commented 7 years ago

Thanks Dude for those updates !

I confirm updates on Dev branch make restoring mounted data with docker working now :+1:

lappsgrid-incubator / galaxy-appliance

Backup data mounted on galaxy_storage volume not restored. #2