Open pdy1084 opened 2 months ago
You must be using a very old version of anvi'o for this to happen, @pdy1084. If you don't want to update your anv'oi, then you need to use COG14_FUNCTION
instead of COG20_FUNCTION
.
Please run anvi-db-info
on your contigs database and take a look at the output to figure out which function annotation sources are available to you.
Hi @meren,
Thank you for your fast reply.
I have checked the version of anvio and I see I have the last version (version 8) as you can see in the beginning of the conda list output:
packages in environment at .../.conda/envs/anvio-8:
Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge _r-mutex 1.0.1 anacondar_1 conda-forge _sysroot_linux-64_curr_repodata_hack 3 h69a702a_16 conda-forge anvio 8 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi asgiref 3.8.1 pypi_0 pypi
Then If I run anvi-db-info over
A) the db resulting from anvi-gen-contigs-database -L 0 -T $THREADS --project-name $PREFIX -f .../results_step0_reformat/cogs/contigs-fasta.fasta -o ${PREFIX}_cogs.db --force-overwrite
,
I get that there are no available sources.
B) the db resulting from anvi-run-pfams -T $THREADS --pfam-data-dir Pfam_v32 -c $PREFIX.db
I get Pfam (as expected) as an available source.
So after running anvi-db-info over the $PREFIX.db, I get:
===============================================
===============================================
===============================================
I already tried running this with COG14 as stated in the github page. However either anvi-setup-ncbi-cogs or anvi-run-ncbi-cogs does not seem to work as when I run anvi-db-info
over the db resulting from anvi-run-ncbi-cogs -T $THREADS --cog-version COG14 --cog-data-dir COG_2014 -c ${PREFIX}_cogs.db
do not get any available sources.
-> I also saw that running the same pipeline for Pfam and COG14, in the latter I do not get the output ${PREFIX}-cogs.txt from anvi-export-functions --annotation-sources COG14_FUNCTION -c ${PREFIX}_cogs.db -o ${PREFIX}-cogs.txt
.
As I could manage to generate ${PREFIX}_cogs.db with anvi-run-ncbi-cogs -T $THREADS --cog-version COG14 --cog-data-dir COG_2014 -c ${PREFIX}_cogs.db
, I would imagine that there is a problem with the command anvi-export-functions --annotation-sources COG14_FUNCTION -c ${PREFIX}_cogs.db -o ${PREFIX}-cogs.txt
, but more specifically to anvi-setup-ncbi-cogs or anvi-run-ncbi-cogs.
And I still see the following errors in the sdt output (now 1 instead of the 3):
Config Error: Something went wrong with your download attempt. Here is the problem for the url ftp://ftp.ncbi.nlm.nih.gov/pub/COG/COG2014/data/cog2003-2014.csv: '<urlopen error [Errno 113] No route to host>'
AND also appears
File .../.conda/envs/plasx/lib/python3.12/site-packages/plasx/pd_utils.py", line 1044, in read_table raise Exception('File {} does not exist'.format(A)) Exception: File gene-catalog-ORFs-cogs.txt does not exist
I hope you can help me to solve this. Thank you very much.
Hi @pdy1084,
Your problem lies in the fact that the computer on which you are doing this analysis has no access to nih.gov as it is suggested by this message:
Config Error: Something went wrong with your download attempt. Here is the problem for the url
ftp://ftp.ncbi.nlm.nih.gov/pub/COG/COG2014/data/cog2003-2014.csv: '<urlopen
error [Errno 113] No route to host>'
You need to successfully run anvi-setup-ncbi-cogs
for things to move forward. I'm sorry.
(perhaps you should talk to your sys admin if you are on your university server)
Hi PlasX team,
Thank you for providing this insightful software. I could manage to run PlasX over my data (set of open reading frames) with the Pfam database. However, when I try to include the download the COG database and proceed with anvi-setup-ncbi-cogs, anvi-run-ncbi-cogs and anvi-export-functions I encounter several errors (mentioned in the section "Terminal output").
Describe the bug
The following lines of code (->) give me the below error. If I comment these lines I can screen my data for the Pfam_v32 database successfully.
-> anvi-setup-ncbi-cogs --cog-version COG20 --cog-data-dir COG_2020 -T $THREADS --reset anvi-setup-pfams --pfam-version 32.0 --pfam-data-dir Pfam_v32 -T $THREADS --reset
Annotate COGs -> anvi-run-ncbi-cogs -T $THREADS --cog-version COG20 --cog-data-dir COG_2020 -c $PREFIX.db
Annotate Pfams anvi-run-pfams -T $THREADS --pfam-data-dir Pfam_v32 -c $PREFIX.db
Export functions to text file -> anvi-export-functions --annotation-sources COG20_FUNCTION,Pfam -c $PREFIX.db -o $PREFIX-cogs-and-pfams.txt anvi-export-functions --annotation-sources Pfam -c $PREFIX.db -o $PREFIX-pfams.txt
I tried changing COG20 for COG14 but still does not work.
Terminal output
------------------FINISHED GENE CALLING WITH PRODIGAL
Config Error: Something went wrong with your download attempt. Here is the problem for the url ftp://ftp.ncbi.nlm.nih.gov/pub/COG/COG2014/data/cog2003-2014.csv: '<urlopen error [Errno 113] No route to host>'
Config Error: It seems you already have Pfam database installed in 'Pfam_v32', please use --reset flag if you want to re-download it.
Config Error: At least one essential formatted file that is necesary for COG operations is not where it should be ('.../results/COG_2014/COG14/PID-TO-CID.cPickle'). You should run COG setup, with the flag
--reset
if necessary, to make sure things are in order.Config Error: One or more of the annotation sources you requested does not appear to be in the contigs database :/ Here is the list: COG14_FUNCTION.
Software environment
packages in environment at .../.conda/envs/plasx: Name Version Build Channel
_libgcc_mutex 0.1 main anaconda _openmp_mutex 5.1 1_gnu anaconda blas 1.0 openblas anaconda blosc 1.21.3 h6a678d5_0 anaconda bottleneck 1.3.7 py312ha883a20_0 anaconda bzip2 1.0.8 h5eee18b_6 anaconda ca-certificates 2024.7.2 h06a4308_0 anaconda expat 2.6.3 h6a678d5_0 anaconda gawk 5.1.0 h7b6447c_0 anaconda joblib 1.4.2 py312h06a4308_0 anaconda ld_impl_linux-64 2.38 h1181459_1 anaconda libffi 3.4.4 h6a678d5_1 anaconda libgcc-ng 11.2.0 h1234567_1 anaconda libgfortran-ng 11.2.0 h00389a5_1 anaconda libgfortran5 11.2.0 h1234567_1 anaconda libgomp 11.2.0 h1234567_1 anaconda libllvm14 14.0.6 hdb19cb5_3 anaconda libopenblas 0.3.21 h043d6bf_0 anaconda libstdcxx-ng 11.2.0 h1234567_1 anaconda libuuid 1.41.5 h5eee18b_0 anaconda llvm-meta 7.0.0 0 conda-forge llvmlite 0.43.0 py312h6a678d5_0 anaconda lz4-c 1.9.4 h6a678d5_1 anaconda mmseqs2 10.6d92c h2d02072_0 bioconda ncurses 6.4 h6a678d5_0 anaconda numba 0.60.0 py312h526ad5a_0 anaconda numexpr 2.8.7 py312he7dcb8a_0 anaconda numpy 1.26.4 py312h2809609_0 anaconda numpy-base 1.26.4 py312he1a6c75_0 anaconda openmp 7.0.0 h2d50403_0 conda-forge openssl 3.0.15 h5eee18b_0 anaconda pandas 2.2.2 py312h526ad5a_0 anaconda pip 24.2 py312h06a4308_0 anaconda plasx 0.0.0 pypi_0 pypi pybind11-abi 5 hd3eb1b0_0 anaconda python 3.12.5 h5148396_1 anaconda python-blosc 1.10.6 py312h526ad5a_0 anaconda python-dateutil 2.9.0post0 py312h06a4308_2 anaconda python-tzdata 2023.3 pyhd3eb1b0_0 anaconda pytz 2024.1 py312h06a4308_0 anaconda readline 8.2 h5eee18b_0 anaconda scikit-learn 1.5.1 py312h526ad5a_0 anaconda scipy 1.13.1 py312h2809609_0 anaconda setuptools 72.1.0 py312h06a4308_0 anaconda six 1.16.0 pyhd3eb1b0_1 anaconda sqlite 3.45.3 h5eee18b_0 anaconda tbb 2021.8.0 hdb19cb5_0 anaconda threadpoolctl 3.5.0 py312he106c6f_0 anaconda tk 8.6.14 h39e8969_0 anaconda tzdata 2024a h04d1e81_0 anaconda wheel 0.44.0 py312h06a4308_0 anaconda xz 5.4.6 h5eee18b_1 anaconda zlib 1.2.13 h5eee18b_1 anaconda zstd 1.5.5 hc292b87_2 anaconda