merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
432 stars 145 forks source link

Please help concoct clustering #2266

Closed kinosham closed 4 months ago

kinosham commented 4 months ago

I've tried a few things and keep getting errors please help.

!/bin/bash

PBS -l nodes=1:ppn=24

PBS -l walltime=720:00:00

PBS -q bigmem

PBS -o /nlustre/users/kinosha/Hammanskraal_concoct_cluster3_kinosha_output

PBS -e /nlustre/users/kinosha/Hammanskraal_concoct_cluster4_kinosha_error

PBS -k oe

PBS -m bae

PBS -M kinoshamoodley@gmail.com

cd /nlustre/users/kinosha

module load anvio8

source /apps/anaconda3-2023.03/etc/profile.d/conda.sh conda activate anvio-8

anvi-cluster-contigs -p /nlustre/users/kinosha/SAMPLES-MERGED/PROFILE.db -c /nlustre/users/kinosha/Hammanskraal_metegonomics_sequences/contigs.db -C CONCOCT --driver concoct --T 16 --just-do-it

I get this error

Interactive session for user: kinosha Node: alf.bi.up.ac.za Your session parameters Queue: bigmem Walltime: 2592000 seconds CPU's allocated: 24

Config Error: One of the critical output files is missing ('clustering_gt1000.csv'). Please take a look at the log file: /tmp/tmpx10hg96i/logs.txt

There is no log file here to figure out what's going on.

Then I tried :

!/bin/bash

PBS -l nodes=1:ppn=24

PBS -l walltime=720:00:00

PBS -q bigmem

PBS -o /nlustre/users/kinosha/Hammanskraal_concoct_cluster3_kinosha_output

PBS -e /nlustre/users/kinosha/Hammanskraal_concoct_cluster3_kinosha_error

PBS -k oe

PBS -m bae

PBS -M kinoshamoodley@gmail.com

cd /nlustre/users/kinosha

module load anvio8

source /apps/anaconda3-2023.03/etc/profile.d/conda.sh conda activate anvio-8

anvi-cluster-contigs -p /nlustre/users/kinosha/SAMPLES-MERGED/PROFILE.db -c /nlustre/users/kinosha/Hammanskraal_metegonomics_sequences/contigs.db -C CONCOCT --driver concoct --length-threshold 1000 -T 16 --just-do-it

I get this error

You should now please run:

source /apps/anaconda3-2023.03/etc/profile.d/conda.sh conda activate anvio-8

WARNING

You are running an experimental workflow not every part of which may be fully and thoroughly tested :) Please scrutinize your output carefully after analysis, and keep us posted if you see things that surprise you.

Contigs DB ...................................: /nlustre/users/kinosha/Hammanskraal_metegonomics_sequences/contigs.db Profile DB ...................................: /nlustre/users/kinosha/SAMPLES-MERGED/PROFILE.db Binning module ...............................: CONCOCT Cluster type .................................: contig Working directory ............................: /tmp/tmp4qikahzg

✖ anvi-cluster-contigs encountered an error after 0:04:17.457916 Traceback (most recent call last): File "/apps/anaconda3-2023.03/envs/anvio-8/bin/anvi-cluster-contigs", line 280, in main(args, unknown) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/anvio/terminal.py", line 915, in wrapper program_method(*args, **kwargs) File "/apps/anaconda3-2023.03/envs/anvio-8/bin/anvi-cluster-contigs", line 197, in main input_files = prepare_input_files(working_dir, merged_profile_db, contigs_db) File "/apps/anaconda3-2023.03/envs/anvio-8/bin/anvi-cluster-contigs", line 126, in prepare_input_files utils.export_sequences_from_contigs_db(contigs_db.db_path, File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/anvio/utils.py", line 3161, in export_sequences_from_contigs_db output_fasta.write_seq(sequence, split=truncate) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/anvio/fastalib.py", line 48, in write_seq self.output_file_obj.write('%s\n' % seq) OSError: [Errno 28] No space left on device

But there is space.

Please help

metehaansever commented 4 months ago

Hello @kinosham, Can you run df -h /tmp to check your disk space. Curious about your usage. Did you use --tmp-dir-path argument? To check where is your tempfile TMPDIR=/Users/your-user-name/Workspace python -c "import tempfile; print(tempfile.gettempdir())"

And also this closed issue may help to find your answer: https://github.com/merenlab/anvio/issues/906

kinosham commented 4 months ago

So I am able to find the log.txt file now. It coming up with this :

DATE: 16 May 24 10:21:07

CMD LINE: concoct --coverage_file

/nlustre/users/kinosha/tmpmsu4obn3/contig_coverages.txt --composition_file /nlustre/users/kinosha/tmpmsu4obn3/sequence_contigs.fa --basename /nlustre/users/kinosha/tmpmsu4obn3 --threads 16 Up and running. Check /nlustre/users/kinosha/tmpmsu4obn3/log.txt for progress Traceback (most recent call last): File "/apps/anaconda3-2023.03/envs/anvio-8/bin/concoct", line 90, in

results = main(args) File "/apps/anaconda3-2023.03/envs/anvio-8/bin/concoct", line 37, in main transform_filter, pca = perform_pca( File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/concoct/transform.py", line 5, in perform_pca pca_object = PCA(n_components=nc, random_state=seed).fit(d) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/sklearn/base.py", line 1474, in wrapper return fit_method(estimator, *args, **kwargs) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 428, in fit self._fit(X) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/sklearn/decomposition/_pca.py", line 483, in _fit X = self._validate_data( File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/sklearn/base.py", line 608, in _validate_data self._check_feature_names(X, reset=reset) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/sklearn/base.py", line 469, in _check_feature_names feature_names_in = _get_feature_names(X) File "/apps/anaconda3-2023.03/envs/anvio-8/lib/python3.10/site-packages/sklearn/utils/validation.py", line 2229, in _get_feature_names raise TypeError( TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type. On Wed, May 15, 2024 at 3:35 PM Metehan Sever ***@***.***> wrote: > Hello @kinosham , > Can you run df -h /tmp to check your disk space. Curious about your usage. > Did you use --tmp-dir-path argument? > To check where is your tempfile > TMPDIR=/Users/your-user-name/Workspace python -c "import tempfile; > print(tempfile.gettempdir())" > > And also this closed issue may help to find your answer: #906 > > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
metehaansever commented 4 months ago

This error looks related with sklearn's version. Can you create another environment with all the anvio dependencies and install sklearn version 1.1.* pip install scikit-learn~=1.1.0 We recently updated sklearn package with py10 migration. So I am not really sure if its gonna work with anvi-cluster-contigs.

metehaansever commented 4 months ago

#2154 Solution described here.