metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Database download #187

Closed fetyj closed 5 years ago

fetyj commented 5 years ago

Hello, I don't have enough memory on my computer for automatic database download (64Gb), I've downloaded it manually but the files seems to be too large (the docs says it use approximately 30Gb). Could you please help with the description of files to be downloaded and how they have to be arranged. Thanks for your help. Regards, Fety

SilasK commented 5 years ago

The two databases that need the most disk space is eggNOG and CAT (taxonomy). The atabases come into play only at the end.

May ve you want to start with

'atlas run binning' which does alredy the assembly and binning.

fetyj commented 5 years ago

Hello, I have run an analysis on dataset downloaded from SRA then get this error on dereplication pre_dereplication.log As for the eggNOG and CAT databases, could you look at the screenshot to see if all files is ok. Thanks for the help! Regards, Fety Screenshot from 2019-03-21 11-00-02

SilasK commented 5 years ago

The first problem is related to #165

0.00% of genomes passed checkM filtering

None of your bins have enough quality/contamination.

How to improve binn results:

SilasK commented 5 years ago

See below the folderstructure of my databases.
I put CAT/taxonomy and CAT/CAT_databases in the same subfolder CAT The important files are the ` 2019-03-05....``

Important you can remove the "2019-03-05.nr.gz" which think uses a lot of space.

If you cannot generate them from the most recent nr. you can download new versions from

$ wget tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20181212.tar.gz

$ tar -xvzf CAT_prepare_20181212.tar.gz
.
├── adapters.fa
├── CAT
│   ├── 2019-03-05.nr.dmnd
│   ├── 2019-03-05.nr.fastaid2LCAtaxid
│   ├── 2019-03-05.nr.gz
│   ├── 2019-03-05.nr.taxids_with_multiple_offspring
│   ├── 2019-03-05.prot.accession2taxid.gz
│   ├── 2019-03-05.taxdump.tar.gz
│   ├── citations.dmp
│   ├── delnodes.dmp
│   ├── division.dmp
│   ├── downloaded
│   ├── gc.prt
│   ├── gencode.dmp
│   ├── merged.dmp
│   ├── names.dmp
│   ├── nodes.dmp
│   └── readme.txt
├── checkm
│   ├── distributions
│   ├── genome_tree
│   ├── hmms
│   ├── hmms_ssu
│   ├── img
│   ├── pfam
│   ├── selected_marker_sets.tsv
│   ├── taxon_marker_sets.tsv
│   └── test_data
├── conda_envs
│   ├── ...
├── EggNOG
│   ├── eggnog.db
│   ├── eggnog_proteins.dmnd
│   ├── og2level.tsv
│   ├── OG_fasta
│   └── OG_fasta.tar.gz
├── host_genome
│   ├── mouse_masked.fa.gz
│   └── Mus_musculus.GRCm38.dna.toplevel.fa.gz
├── phiX174_virus.fa
└── silva_rfam_all_rRNAs.fa
fetyj commented 5 years ago

Thanks for the quick reply! Regards,

Fety

fetyj commented 5 years ago

Hello, Could you tell what's go wrong one this one:

RuleException:
AttributeError in line 13 of /home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk:
module 'os' has no attribute 'dirname'
  File "/home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk", line 13, in __rule_get_genome_for_cat
  File "/home/fiestaj/anaconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
RuleException:
AttributeError in line 13 of /home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk:
module 'os' has no attribute 'dirname'
  File "/home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk", line 13, in __rule_get_genome_for_cat
  File "/home/fiestaj/anaconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
AttributeError in line 13 of /home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk:
module 'os' has no attribute 'dirname'
  File "/home/fiestaj/atlas/atlas/rules/cat_taxonomy.smk", line 13, in __rule_get_genome_for_cat
  File "/home/fiestaj/anaconda3/envs/atlasenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message

After check it seems that get_genome_for_cat give no output in the genomes/taxonomy/MAG/ folder while the fasta files are present in genomes/genomes/ folder

SilasK commented 5 years ago

I corrected it in the last commit e3ab4ec

you are working on the master branch, right?

fetyj commented 5 years ago

Yes, I have done the installation on github, I'll launch update and let you know

fetyj commented 5 years ago

Hello, Everything works perfectly, thanks for all the tips and your help. Regards, Fety