metageni / SUPER-FOCUS

A tool for agile functional analysis of shotgun metagenomic data
GNU General Public License v3.0
21 stars 12 forks source link

superfocus_downloadDB #24

Closed linsalrob closed 4 years ago

linsalrob commented 5 years ago

Currently superfocus_downloadDB downloads a huge file that includes all the clusters. I don't think the clusters are necessary, and the install fails.

I propose downloading a smaller db.zip that only contains the static files for the appropriate aligner

metageni commented 5 years ago

@linsalrob I will eventually contact you about this. You have good reasons why we should do it, and I have some good reasons why we should not.

ehj000 commented 5 years ago

Dear metageni, The installation of databases does not complete properly. I tried to install the diamond database

superfocus_downloadDB -a diamond

It prints errors like:

Database file: /home/miniconda2/envs/python3/lib/python3.6/site-packages/superfoclus_app/db/clusters/100_clusters.fasta Opening the database file... No such file or directory

The directory exists, although with multiple files (1290 files) with the file extension .faa

Any solution how set up the database properly? Thanks Erik

metageni commented 5 years ago

@ehj000 It seems to be a permission problem. Another user has reported this problem, and I'm still investigating why it affects some users.

what do you get if you try ls /home/miniconda2/envs/python3/lib/python3.6/site-packages/superfoclus_app/db/clusters/100?

if you get all the ~ 1290 files, you can cat the files in the folder and manually repeat the steps starting in (https://github.com/metageni/SUPER-FOCUS/blob/master/superfocus_app/superfocus_downloadDB.py#L79)

bashirhamidi commented 5 years ago

@ehj000 when you are downloading the database, make sure you are within that directory (as opposed to giving the directory path in the downloadDB command.

ehj000 commented 5 years ago

Dear metageni and bashirhamidi, Got the database formated and in the correct place. However I had to cope the file "database_PKs.txt" from an old installation of Super Focus to the superfocus_app/db directory as it was not automatically produced (not sure how to do this).

It seems to run now. Thank you for your help. Merry Christmas Erik

abhijeetsingh1704 commented 5 years ago

Why cant download and formatting database thing can be two different processes. Every time it takes endless time to download if you have any errors in the downloading or formatting the database in first attempt. Or may be include few lines of code to ask to redirect the script to a database if we already have a copy pre-downloaded.

metageni commented 5 years ago

Hi @abhijeetsingh1704 I will consider adding an option to the download script to pass a directory where the downloaded database is located.

It should be in the next release. How does it sound? Thanks for the suggestion

abhijeetsingh1704 commented 5 years ago

I wanted to install the program on cluster computer but there is always a long wait (several hours) to get the database downloaded, but there is most of the time some error in database format, and i had to start all over again. Which means again downloading the database :( . But would be great if you consider to fix this issue. Moreover, I don't understand the following error. superfocus -q bin4.fasta -dir test -a blast [2019-01-15 13:01:34,401 - INFO] SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data [2019-01-15 13:01:34,415 - CRITICAL] QUERY: bin4.fasta is not a directory Traceback (most recent call last): File "/opt/sw/superfocus/0.32/bin/superfocus", line 11, in <module> load_entry_point('superfocus==0.31', 'console_scripts', 'superfocus')() File "/opt/sw/superfocus/0.32/lib/python3.6/site-packages/superfocus_app/superfocus.py", line 275, in main elif is_wanted_file(os.listdir(queries_folder)) == []: NotADirectoryError: [Errno 20] Not a directory: 'bin4.fasta'

metageni commented 5 years ago

@abhijeetsingh1704 @ehj000 @bashirhamidi I have released a new version (v0.34) of superfocus (https://github.com/metageni/SUPER-FOCUS/releases) where users can now manually download the database and format it using a script I wrote.

It should be live on bioconda in 1-2 days. I will see if I can push it tomorrow into pip.

For now, you can pip it using (https://github.com/metageni/SUPER-FOCUS/releases).

Please read (https://github.com/metageni/SUPER-FOCUS#database) on how to download and format the database.

Any feedback is welcome.

abhijeetsingh1704 commented 5 years ago

superfocus -q trial_bins/ -dir superfocus_result/ -a blast -db DB_95 [2019-04-09 19:54:04,158 - INFO] SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data Traceback (most recent call last): File "/home/abhijeet/anaconda3/envs/super-focus/bin/superfocus", line 11, in load_entry_point('superfocus==0.0.0', 'console_scripts', 'superfocus')() File "/home/abhijeet/anaconda3/envs/super-focus/lib/python3.7/site-packages/superfocus_app/superfocus.py", line 328, in main subsystems_translation = get_subsystems(Path(WORK_DIRECTORY, "db/database_PKs.txt")) File "/home/abhijeet/anaconda3/envs/super-focus/lib/python3.7/site-packages/superfocus_app/superfocus.py", line 106, in get_subsystems with open(translation_file) as database_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/abhijeet/anaconda3/envs/super-focus/lib/python3.7/site-packages/superfocus_app/db/database_PKs.txt'

How to fix this ? From where I can download the database_PKs.txt file, as someone was mentioning in other post.

@metageni - would be good, if you could upload the files somewhere, just in case if some installations were missing some required files, OR, what do you think?

metageni commented 5 years ago

@abhijeetsingh1704 which version of Superfocus are you running?

Did you run superfocus_downloadDB to download the blast database?

It is odd because this file comes with the tool installation (https://github.com/metageni/SUPER-FOCUS/tree/4bdaefff32fef9ad031345c7ded93a089807ce83/superfocus_app/db)

abhijeetsingh1704 commented 5 years ago

abhijeet@abhijeet:~$ superfocus -v SUPER-FOCUS 0.34

The version is the newest as you mentioned, V0.34, nevertheless i will get the database_PKs.txt file and will put it in db, and hope it would work.

metageni commented 5 years ago

If not, please try installing it with pip (pip3 install superfocus)

GenomicaMicrob commented 5 years ago

no database_PKs file when installed with conda (and the db formatted with it), and if I download the file from github, it says "IndexError: list index out of range". Indeed the installation need some polishing.

metageni commented 5 years ago

Hi @GenomicaMicrob, Please try to install it with pip and let me know how it goes. I'm still working with conda to see what's up.

Thomieh73 commented 4 years ago

Hi, I was just about the set-up Super-focus and I wondered about the database that I could download. When was it created?

Thomieh73 commented 4 years ago

Hi I tried setting up the database but that failed using the normal route. But with a bit fidling it worked. What did I do?

I installed super-focus using conda in an enviroment only for super-focus. I have tried both commands available on the conda pages. https://anaconda.org/bioconda/super-focus

The first command installed version 0.34 and the second command installed 0.33.

After the first installation I manually downloaded the database as indicated, and I then went to the step to set up the databases. With both installations this crashed due to the fact that it could not write to the folder. When I checked the folder, where it should be written, a sub folder of the superfocus_app folder, I find that the folder db was missing. Not sure why....

I decided to start all over, and removed the conda iinstallation and the environment. Then I created a new environment with conda and installad python 3.7.0.

Next I ran: pip3 install superfocus

This install superfocus but the aligners were still missing.

Thus with conda I installed from the bioconda channel: blast (2.9.0), diamond (0.9.26) and rapsearch (2.24).

Then I started again with building the database with the command

superfocus_downloadDB -i clusters -a rapsearch -c 90

This again failed because the directory where the DB_90 has to be stored is missing, In fact the whole db directory is missing and the command spits out that it can not write files.

To solve this I downloaded the github repo and then manually put the db folder, inside the: superfocus_app folder

Then it worked to create a database.

So could it be that the db folder is not added to the pip and the conda repos when they are created, since the github repo has this gitignore file in it ? I am not sure how that works,

Thomas

metageni commented 4 years ago

Hey @Thomieh73, thanks for the feedback. I'm glad you were able to figure this out.

There is something wrong with the bioconda installation that I have not figured out yet. I will take it down and put it back once it works fine.

In the meantime, please should be fine with the pip version - I just installed it with no problem. (pip superfocus, download db, and run superfocus_downloadDB). I think the problem you just had has to do with some system permission, which limits the database creating. I heard the same problem from other people, but I was not able to reproduce. A tip that I put on the README is to download the small version of the database and try to format it. It makes much easier to debug, etc.

Sorry again for the problems you had, but I'm glad you figure it out.

Best

Geni

Thomieh73 commented 4 years ago

I had checked the permission, but that all seemed okay. My database creation basically failed because the superfocus bioconda installation does not create this db folder in the superfocus-app folder. And the superfocus_downloadDB script does not do that right? it already has to be there to function.