Closed fetyj closed 5 years ago
The two databases that need the most disk space is eggNOG and CAT (taxonomy). The atabases come into play only at the end.
May ve you want to start with
'atlas run binning' which does alredy the assembly and binning.
Hello, I have run an analysis on dataset downloaded from SRA then get this error on dereplication pre_dereplication.log As for the eggNOG and CAT databases, could you look at the screenshot to see if all files is ok. Thanks for the help! Regards, Fety
The first problem is related to #165
0.00% of genomes passed checkM filtering
None of your bins have enough quality/contamination.
reports/bin_report_{final_binner}.html
How to improve binn results:
See below the folderstructure of my databases.
I put CAT/taxonomy
and CAT/CAT_databases
in the same subfolder CAT
The important files are the ` 2019-03-05....``
Important you can remove the "2019-03-05.nr.gz" which think uses a lot of space.
If you cannot generate them from the most recent nr. you can download new versions from
$ wget tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20181212.tar.gz
$ tar -xvzf CAT_prepare_20181212.tar.gz
.
├── adapters.fa
├── CAT
│ ├── 2019-03-05.nr.dmnd
│ ├── 2019-03-05.nr.fastaid2LCAtaxid
│ ├── 2019-03-05.nr.gz
│ ├── 2019-03-05.nr.taxids_with_multiple_offspring
│ ├── 2019-03-05.prot.accession2taxid.gz
│ ├── 2019-03-05.taxdump.tar.gz
│ ├── citations.dmp
│ ├── delnodes.dmp
│ ├── division.dmp
│ ├── downloaded
│ ├── gc.prt
│ ├── gencode.dmp
│ ├── merged.dmp
│ ├── names.dmp
│ ├── nodes.dmp
│ └── readme.txt
├── checkm
│ ├── distributions
│ ├── genome_tree
│ ├── hmms
│ ├── hmms_ssu
│ ├── img
│ ├── pfam
│ ├── selected_marker_sets.tsv
│ ├── taxon_marker_sets.tsv
│ └── test_data
├── conda_envs
│ ├── ...
├── EggNOG
│ ├── eggnog.db
│ ├── eggnog_proteins.dmnd
│ ├── og2level.tsv
│ ├── OG_fasta
│ └── OG_fasta.tar.gz
├── host_genome
│ ├── mouse_masked.fa.gz
│ └── Mus_musculus.GRCm38.dna.toplevel.fa.gz
├── phiX174_virus.fa
└── silva_rfam_all_rRNAs.fa
Thanks for the quick reply! Regards,
Fety
Hello, Could you tell what's go wrong one this one:
RuleException:
AttributeError in line 13 of /home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk:
module 'os' has no attribute 'dirname'
File "/home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk", line 13, in __rule_get_genome_for_cat
File "/home/fiestaj/anaconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
RuleException:
AttributeError in line 13 of /home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk:
module 'os' has no attribute 'dirname'
File "/home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk", line 13, in __rule_get_genome_for_cat
File "/home/fiestaj/anaconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
AttributeError in line 13 of /home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk:
module 'os' has no attribute 'dirname'
File "/home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk", line 13, in __rule_get_genome_for_cat
File "/home/fiestaj/anaconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message
After check it seems that get_genome_for_cat
give no output in the genomes/taxonomy/MAG/ folder while the fasta files are present in genomes/genomes/ folder
I corrected it in the last commit e3ab4ec
you are working on the master branch, right?
Yes, I have done the installation on github, I'll launch update and let you know
Hello, Everything works perfectly, thanks for all the tips and your help. Regards, Fety
Hello, I don't have enough memory on my computer for automatic database download (64Gb), I've downloaded it manually but the files seems to be too large (the docs says it use approximately 30Gb). Could you please help with the description of files to be downloaded and how they have to be arranged. Thanks for your help. Regards, Fety