Closed ucassee closed 3 years ago
edit: @ucassee i think you found a bug in our CPU ram config :) we check and report back. thanks
the issue was a missing hardware configuration for virsorter 2. we will push a hotfix release today to fix that
Hey,
we fixed the config files where virsorter 2 was missing.
-r v1.0.1
and re-execute your commandHi @replikation,
I am not sure which process it was.
But the command is like this hmmsearch -T 30 --tblout iter-0/all.pdg.faa.splitdir/all.pdg.faa.ss.1.split.Viruses.splithmmtbl --cpu 1 --noali -o /dev/null /db/hmm/viral/combined.hmm /tmp/vs2-K6zvoLzjXZlu/all.pdg.faa.ss.1.split
I will try new version.
Hi, There is still an error when I use an unprivileged account. How can I debug this?
Error executing process > 'identify_fasta_MSF:fasta_validation_wf:input_suffix_check (1)'
Caused by:
Process `identify_fasta_MSF:fasta_validation_wf:input_suffix_check (1)` terminated with an error exit status (1)
Command executed:
case "test.fasta" in
*.gz)
zcat test.fasta > test.fa
;;
*.fna)
cp test.fasta test.fa
;;
*.fasta)
cp test.fasta test.fa
;;
*.fa)
;;
*)
echo "file format not supported...what the phage...(.fa .fasta .fna .gz is supported)"
exit 1
esac
# tr whitespace at the end of lines
sed 's/[[:blank:]]*$//' -i test.fa
# remove ' and "
tr -d "'" < test.fa | tr -d '"' | tr -d "[]" > tmp.file && mv tmp.file test.fa
# replace ( ) | . , / and whitespace with _
sed 's#[()|.,/ ]#_#g' -i test.fa
# remove empty lines
sed '/^$/d' -i test.fa
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Convert SIF file to sandbox...
ERROR : Failed to create user namespace: user namespace disabled
Thanks
i think that has to be configured from the e.g. cluster-admin side of things I think (https://github.com/hpcng/singularity/issues/5240)
Hi, I met a new error. How can I debug? Thanks.
Error executing process > 'identify_fasta_MSF:fasta_validation_wf:input_suffix_check (1)'
Caused by:
Process `identify_fasta_MSF:fasta_validation_wf:input_suffix_check (1)` terminated with an error exit status (255)
Command executed:
case "all_pos_phage.fa" in
*.gz)
zcat all_pos_phage.fa > all_pos_phage.fa
;;
*.fna)
cp all_pos_phage.fa all_pos_phage.fa
;;
*.fasta)
cp all_pos_phage.fa all_pos_phage.fa
;;
*.fa)
;;
*)
echo "file format not supported...what the phage...(.fa .fasta .fna .gz is supported)"
exit 1
esac
# tr whitespace at the end of lines
sed 's/[[:blank:]]*$//' -i all_pos_phage.fa
# remove ' and "
tr -d "'" < all_pos_phage.fa | tr -d '"' | tr -d "[]" > tmp.file && mv tmp.file all_pos_phage.fa
# replace ( ) | . , / and whitespace with _
sed 's#[()|.,/ ]#_#g' -i all_pos_phage.fa
# remove empty lines
sed '/^$/d' -i all_pos_phage.fa
Command exit status:
255
Command output:
(empty)
Command error:
INFO: Convert SIF file to sandbox...
FATAL: while extracting /data/database/wtp/singularity_images/nanozoo-basics-1.0--962b907.img: root filesystem extraction failed: could not extract squashfs data, unsquashfs not found
Work dir:
/data/Project/1.Mariana/4.virus/wtptempt/06/a6e46c15f82e227d71fd2c533ae0e1
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
This still looks like a problem with Singularity on your system:
Command error:
INFO: Convert SIF file to sandbox...
FATAL: while extracting /data/database/wtp/singularity_images/nanozoo-basics-1.0--962b907.img: root filesystem extraction failed: could not extract squashfs data, unsquashfs not found
What is your version?
singularity --version
Was singularity installed by a system administrator and configured appropriately?
You can also try if singularity works outside of the Nextflow framework of WtP:
singularity run /data/database/wtp/singularity_images/nanozoo-basics-1.0--962b907.img wget --version
Hi @hoelzer, My singularity version is 3.6.3. I use conda to install it with my own account.
when run singularity run /data/database/wtp/singularity_images/nanozoo-basics-1.0--962b907.img wget --version
with the same error:
INFO: Convert SIF file to sandbox...
FATAL: while extracting /data/database/wtp/singularity_images/nanozoo-basics-1.0--962b907.img: root filesystem extraction failed: could not extract squashfs data, unsquashfs not found
@ucassee okay the version should be fine.
But I experienced issues in the past when installing Singularity via conda on an HPC system where I'm not root. Are you running the pipeline on a (administrated) cluster machine? HPC, work station, ... or similar?
If so, I think you should ask your system admin to install Singularity properly with root access. E.g., I did it using the following manual/notes on my local machine:
Hi @hoelzer I will try to contact the system admin of our cluster. But when I run the wtp using our work station. It is still going error with some virus prediction tools. I uploaded one report. Can I debug this or it is okay to ignore them?
Thanks execution_report.zip
@ucassee it looks like three virus prediction tools failed:
Although that is not nice, it can happen based on your input that some tools will not work. But WtP will run through anyway with the other tools. When you run the test profile on your work station, do these three tools work in general? Than it's fine and no need for debugging
we can take a look at this, but as martin metnioned we "autoskip" tools if they fail for various reasons so you get actual results and are not annoyed with tons of bugs :)
We would need the "temporary dirs" to check out whats going on.
the following dirs would be of interest (located in the work dir):
40/f579e2*
8f/67a0ad*
4c/752a29*
inside are hidden files like .command.log
. And an ls -lah
per dir would also be great so you dont need to send us the whole fasta input - but we need to know which "files were present" during the error.
thanks
Hi @replikation @hoelzer
Thanks for your reply. If you need any other files, please let me know. virnet.zip pprmeta.zip seeker_wf.zip
Hi, I run the test profile, but there is still one error. I attached the report. Thanks phigaro.zip execution_report.zip
I will look into it tomorrow
Hey, unfortunately I was not able to reproduce your error with:
nextflow run phage.nf --cores 16 -profile local,smalltest,singularity
I checked the phigaro.zip file .command.err
:
WARNING: underlay of /etc/localtime required more than 50 (95) bind mounts
It seems this is linked to CentOS and using singularity... I will try to find a solution for this
Hi @mult1fractal, Thanks for your effort, when you solve it please let me know. If I could find some clues, I will also report here.
Hi @mult1fractal WTP seems to run all identifiers parallelly. I used a local server to run it and saw a heavy load at the beginning. Sometimes I use a bigger assembly file (>500MB), my server crashed and restarted. Do these relate to the error I met before? Thanks
@ucassee Could you please provide the command used? The amount of parallel runs is basically controlled via the cores flag in relation to the max_cores on a local run.
Hi @replikation The following is the command:
nextflow run /data2017/.nextflow/assets/replikation/What_the_Phage --fasta ${i} \
--cachedir /data2017/database/wtp/singularity_images \
--databases /data2017/database/wtp/nextflow-autodownload-databases \
--output wtpresult/${n} \
--workdir wtptempt \
--cores 6 \
-profile local,singularity \
--filter 10000 --identify
The maximum of threads of my server is 80. But the load average of CPU could reach 130 at the beginning of the program.
Hi all,
The .command.sh of the phigaro is the following:
#!/bin/bash -ue
phigaro -f Dive121-T2_filtered.fa -o output -t 6 --wtp --config /root/.phigaro/config.yml
cat output/phigaro.txt > output/phigaro_${PWD##*/}.txt
echo "" >> output/phigaro_${PWD##*/}.txt
But there is no config file at /root/.phigaro/config.yml. So is this also related to the error I reported before.
the error of phigaro can be debugged by usingphigaro-setup
in the singularity environment.
But the trouble of identifiers running parallelly is still, even I set --max_cores 60
I attached the screen from top
top - 21:20:11 up 1:32, 5 users, load average: 116.63, 92.43, 53.11
Tasks: 1024 total, 14 running, 909 sleeping, 101 stopped, 0 zombie
%Cpu(s): 86.7 us, 11.2 sy, 0.0 ni, 2.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13207409+total, 99502976+free, 22430808 used, 30328035+buff/cache
KiB Swap: 98302976 total, 98302976 free, 0 used. 12958008+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
106892 root 20 0 6372172 625320 7704 R 1319 0.0 1:42.10 python
106895 root 20 0 6363724 619036 7676 R 1311 0.0 1:43.63 python
106866 root 20 0 6368076 619040 7600 R 1291 0.0 1:41.38 python
106888 root 20 0 6372172 617180 7744 R 1190 0.0 1:40.34 python
71231 root 20 0 24.5g 5.2g 79216 S 795.7 0.4 14:41.23 python3
138375 root 20 0 1649704 176840 1316 R 155.7 0.0 48:44.64 hmmsearch
132719 root 20 0 1644468 166392 1316 R 146.2 0.0 32:39.79 hmmsearch
69196 root 20 0 99036 21540 1228 S 96.7 0.0 11:20.96 hmmsearch
68296 root 20 0 99720 24880 1228 R 92.8 0.0 11:22.26 hmmsearch
69249 root 20 0 103696 26348 1228 R 89.8 0.0 11:12.14 hmmsearch
97654 root 20 0 106324 29700 1228 R 87.2 0.0 9:13.81 hmmsearch
68221 root 20 0 90276 12448 1228 R 86.6 0.0 11:32.91 hmmsearch
68300 root 20 0 103144 26796 1228 R 86.2 0.0 11:18.66 hmmsearch
68348 root 20 0 95124 19284 1228 S 86.2 0.0 11:03.74 hmmsearch
69250 root 20 0 92940 13728 1228 S 83.9 0.0 11:24.31 hmmsearch
69251 root 20 0 89316 13316 1228 R 83.9 0.0 10:55.22 hmmsearch
104050 root 20 0 94460 16416 1228 S 83.9 0.0 8:58.32 hmmsearch
16320 root 20 0 972040 180936 79728 S 77.0 0.0 50:34.07 blastn
65276 root 20 0 115900 39780 1216 S 77.0 0.0 15:12.48 hmmsearch
47826 root 20 0 114560 38300 1216 S 74.8 0.0 15:39.90 hmmsearch
61666 root 20 0 119376 39856 1216 S 74.8 0.0 15:20.87 hmmsearch
91333 root 20 0 120196 42424 1216 S 74.4 0.0 14:09.76 hmmsearch
75127 root 20 0 113324 35476 1216 S 71.5 0.0 14:50.19 hmmsearch
66739 root 20 0 122700 36488 1216 S 67.9 0.0 15:15.71 hmmsearch
45614 root 20 0 135300 50320 1216 S 66.6 0.0 15:36.20 hmmsearch
72592 root 20 0 131512 55280 1216 S 65.2 0.0 14:57.75 hmmsearch
108818 root 20 0 26252 12840 4468 R 4.9 0.0 0:00.15 sourmash
The tensorflow version in pprmeta, virnet and seeker images are 2.3, but the old CPU that doesn't support avx will have Illegal instruction (core dumped)
problem.
Please see here https://github.com/tensorflow/tensorflow/issues/17411
I suggest you use tensorflow==1.5 to regenerate the image for compatibility with the older CPU. Thanks
Okay...
for virnet pprmeta and seeker: I can try to build the images with tensorflow==1.5
but I'm not sure if the tools will work with this version of tensorflow
for Phigaro: I'm not able to reproduce this error with your command you posted above, nor with the command I used:
nextflow run replikation/What_the_Phage --cores 16 -profile local,smalltest,singularity --dv --ma --mp --pp --sm --vf --vn --vs --vs2 --sk --vb --cachedir singularity_images/ --identify -r v1.0.1
I will try both commands with a larger Inputfile, maybe this causes the issue
Hi @mult1fractal ,
For Phigaro error, I use singularity run multifractal-phigaro-0.5.2.img
and phigaro-setup
to generate the config file /root/.phigaro/config.yml
.
I used a new server that supports avx, so I can get the results from all wrapped tools for some small input tfile. But for larger input file there is also an error.
WARNING: underlay of /etc/localtime required more than 50 (93) bind mounts
Using TensorFlow backend.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
WARNING: underlay of /etc/localtime required more than 50 (93) bind mounts
Using TensorFlow backend.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2021-02-02 07:13:17.357673: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-02-02 07:13:17.416864: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1895280000 Hz
2021-02-02 07:13:17.431704: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a62670 executing computations on platform Host. Devices:
2021-02-02 07:13:17.431779: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2021-02-02 07:13:17.580242: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
Starting VirNet
Loading Data TS01-B03_fragments.fasta
Loaded 8256 fragments
Loading Tokenizer
Start Predictions
1024/8256 [==>...........................] - ETA: 35s
2048/8256 [======>.......................] - ETA: 30sTraceback (most recent call last):
File "/virnet/predict.py", line 51, in <module>
main()
File "/virnet/predict.py", line 47, in main
predictions=run_pred(model,x_data)
File "/virnet/predict.py", line 20, in run_pred
y_prop=model.predict(input_data)
File "/virnet/NNClassifier.py", line 100, in predict
return self.model.predict([X],batch_size=1024, verbose=1)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1462, in predict
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 324, in predict_loop
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
run_metadata=self.run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[523,599] = 3150 is not in [0, 3150)
[[{{node embedding_1/embedding_lookup}}]]
Hi @mult1fractal @replikation,
Virnet is designed for virus reads identification, not for assembly. Please see https://github.com/alyosama/virnet/issues/8 I think this is a cause of the virnet error, so I suggest you remove it from WTP. I see a new tool having good performance in virus identification. Also, attached here for you to consider. https://github.com/ablab/viralVerify
I am using wtp for my next project, you provide a powerful and convenient workflow. Best
Hey @ucassee
before the input-fasta sequence gets to virnet, we split the fasta file into 3000bp chunks as suggested from the virnet-Dev.
Okay nice, I will check it and put it on our list of tools to integrate.
Hi,
I use the following command to run wtp with an input fasta file (~60M):
nextflow run replikation/What_the_Phage --fasta wtp/all_combined.fasta --databases nextflow-autodownload-databases --cachedir singularity_images --output wtpresult --cores 20 -profile local,singularity -r v1.0.0
I find the
hmmsearch
step only running with one thread. I wonder whether some configure file I should modify to speed up this process.It finished with errors and I attached the report. execution_report.zip
When I use the same command on a cluster (PBS system), it showed the following error:
How can I debug?
Thanks