Errors on saving in SWAN

michzimny commented 5 years ago

As I have had issues with upgrading the existing instance with the webng_beta_psnc brach, I have managed first to deploy a new instance of kuboxed from scratch, based on the webng_beta_psnc branch and without Up2U SSO integration.

I'm experiencing the following issue in SWAN. When saving files, SWAN says: "Unexpected error while saving file: test.ipynb [Errno 5] Input/output error: '/eos/user/u/user6/.~test.ipynb'" but files get saved.

Regarding autosave, there is similiar message in JS console: "API request failed (500): Unexpected error while saving file: test.ipynb [Errno 5] Input/output error: '/eos/user/u/user6/.~test.ipynb'"

I cannot see any related logs in /var/kubeVolumes in the host with nodeApp=swan. I couldn't also find any related messages in swan-daemon logs.

The output of kubectl -n boxed logs jupyter-user6 can be found here: https://pastebin.com/tNs2Gyvj

I don't know what to check more.

Are you able to help me with this?

ebocchi commented 5 years ago

Likely because of the recycle bin on eos has not been set

ebocchi commented 5 years ago

Could you please have a look on the MGM container in /root/config_instance/ and run set_recycle.sh? Actually, before doing so, have you run any other script in this folder?

michzimny commented 5 years ago

This is a fresh installation and I have not run any scripts in the MGM container before.

I just run the set_recycle.sh. I got the output provided below, so probably that one command from the script failed because of wrong syntax: eos recycle config --size 100GB

# bash set_recycle.sh
Configuring eos recycle bin...
Usage: recycle ls|purge|restore|config ...
'[eos] recycle ..' provides recycle bin functionality to EOS.
Options:
recycle :
                                                  print status of recycle bin and if executed by root the recycle bin configuration settings.
recycle ls :
                                                  list files in the recycle bin
recycle purge :
                                                  purge files in the recycle bin
recycle restore [--force-original-name|-f] [--restore-versions|-r] <recycle-key> :
                                                  undo the deletion identified by <recycle-key>
       --force-original-name : move's deleted files/dirs back to the original location (otherwise the key entry will have a <.inode> suffix
       --restore-versions    : restore all previous versions of a file
recycle config --add-bin <sub-tree>:
                                                  configures to use the recycle bin for deletions in <sub-tree>
recycle config --remove-bin <sub-tree> :
                                                  disables usage of recycle bin for <sub-tree>
recycle config --lifetime <seconds> :
                                                  configure the FIFO lifetime of the recycle bin
recycle config --ratio < 0 .. 1.0 > :
                                                  configure the volume/inode keep ratio of the recycle bin e.g. 0.8 means files will only be recycled if more than 80% of the space/inodes quota is used. The low watermark is 10% under the given ratio by default e.g. it would cleanup volume/inodes to be around 70%.
'ls' and 'config' support the '-m' flag to give monitoring format output!
'ls' supports the '-n' flag to give numeric user/group ids instead of names!
success: recycle bin lifetime configured!
Done.

michzimny commented 5 years ago

@ebocchi, would you have any suggestions how to set up the recycle bin?

@diocas, could you please have a look at this issue too? This is weird as it's a completely fresh instance (other than that one you can access now) based on the webng_beta_psnc branch.

ebocchi commented 5 years ago

@michzimny I am not able to reproduce it. Please check it again and re-open in case it is still a problem

michzimny commented 5 years ago

I'm reopening as I can reproduce failing scripts in /root/config_instance/ of the MGM container, also when starting from scratch (see no 1 and 2 below). Moreover, I'm still experiencing the saving issue in the instance described in the beginning of this thread (see no3 below).

1)

First, could please clarify whether any bash scripts should be manually run for eos after the first launch? It worked like this in early revisions but at some point, the instructions to call bash scripts for eos was removed from the docs.

2)

And I just tried from scratch on a fresh k8s cluster with the pre-created /mnt/* directories for EOS-related services on worker nodes as prescibed in the docs. Please follow me (on the master node):

git clone https://github.com/cernbox/kuboxed.git
cd kuboxed
git checkout webng_beta_psnc
kubectl create -f BOXED.yaml
kubectl create -f LDAP.yaml
    # waiting for the ldap pod to run
kubectl -n boxed exec -it ldap-78985dbc44-92dmm bash /root/addusers.sh
kubectl create -f eos-storage-mgm.yaml
    # waiting for the eos-mgm pod to run
bash eos-storage-fst.sh 1 eos-mgm.boxed.svc.cluster.local eos-mgm.boxed.svc.cluster.local docker default
bash eos-storage-fst.sh 2 eos-mgm.boxed.svc.cluster.local eos-mgm.boxed.svc.cluster.local docker default
kubectl -n boxed create -f eos-storage-fst1.yaml
kubectl -n boxed create -f eos-storage-fst2.yaml
    # waiting for the fst pods to run
kubectl -n boxed exec -it eos-mgm eos fs ls
    # boot=booted, configstatus=rw for both FSTs
kubectl -n boxed exec -it eos-mgm eos recycle config --size 100GB
    # it fails with "Error: unknown flag: --size (...)"
kubectl -n boxed exec -it eos-mgm bash /root/config_instance/set_recycle.sh
    # it fails partially as in my previous comment

3)

In addition, I have eosfuse logs from the instance described in the first post. They contain some errors related to saving notebooks in SWAN. I'm sharing it via email.

michzimny commented 5 years ago

And continuing with the super-fresh instance from point 2 from my previous comment:

kubectl create -f CERNBOX.yaml
kubectl create -f SWAN.yaml
    # waiting for pods to run

Now, I'm opening CERNBox in my web browser, loggining in as user0, and closing the browser tab. Then, I'm opening SWAN in the web browser, loggining in as user0, spawning a user container, creating a notebook, typing 1+2, pressing CTRL+Enter, receiving the result, and clicking the save icon. I'm getting:

Unexpected error while saving file: Untitled.ipynb [Errno 5] Input/output error: '/eos/user/u/user0/.~Untitled.ipynb'

This way, I managed to reproduce the saving issue in another fresh environment (the first one was that one reported when creating this issue). Please help.

michzimny commented 5 years ago

For clarifying: the same happens when I do not touch any scripts in the MGM container on deploying a new instance.

Additional information from the MGM container that might help:

[root@eos-mgm config_instance]# eos recycle   # returns nothing? shouldn't print status?
[root@eos-mgm config_instance]# echo $?
0
[root@eos-mgm config_instance]# eos --version
EOS 4.2.26 (CERN)
(...)

Please note that it seems that "eos recycle config --size" parameter was probably added in later EOS' version (4.4.25): https://github.com/cern-eos/eos/commit/74529834cb1399c16bb55843d1330c6be158d82d#diff-349c85aae1d71151c001702f17a2b5f0

diocas commented 5 years ago

Hi Michal, is this the swan.test.up2u... instance or the other that you mentioned in the email that you created for the students? Because we were debugging in the swan.test (the one we had access), but just now realized you were "user6" but there seems to be no user6 in this EOS.

michzimny commented 5 years ago

This is the other one for the students. The issue appears when a brand new instance is launched based on that branch (as in my steps described above). Do you need to access that other instance?

The swan.test.up2u... instance uses older versions of EOS and SWAN, so this is probably why we're not experiencing this issue there.

diocas commented 5 years ago

It would help to make it work in order to be ready for Monday. The versions in swan.test are the latest ones we have in ScienceBox. We haven't pushed the updated version yet. So, it should be the same version you have in that new instance. The only thing new is CERNBox and its components (cernboxgateway, cernboxmysql).

michzimny commented 5 years ago

@diocas, I will send you a private message very soon about the access to that instance.

I think the versions in swan.test, apart from cernbox-related images, are not the latest ones. All LDAP, EOS, and SWAN images are older than in webng_beta_psnc. See my email from 2019-02-14, 23:50 for details.

michzimny commented 5 years ago

We're experiencing another issue in beta - cannot delete files. It says:

Błąd podczas usuwania pliku „.test.txt”.

meaning

Error deleting the file ".test.txt".

Please try to delete anything as user6. It might be related with this issue.

michzimny commented 5 years ago

The solution for the saving and deleting (the previous comment) issues is to call: eos quota set -g 99 -v 10G -i 1M -p /eos/docker/proc/recycle or use eos-mgm image v0.8 when deploying from scratch.

Thank you, @ebocchi and @diocas!

up2university / kuboxed

Errors on saving in SWAN #28