metageni / SUPER-FOCUS

A tool for agile functional analysis of shotgun metagenomic data
GNU General Public License v3.0
21 stars 12 forks source link

Add --db-dir argument to downloadDB script #73

Closed boulund closed 2 years ago

boulund commented 2 years ago

Hi, I took the liberty of making some minor feature additions/improvements to the superfocus_downloadDB script. I was having issues running the script since it wants to create the database files inside my conda environment folder (~/.conda/envs/superfocus/lib/python3.10/site-packages/superfocus_app/...) which I don't want it to.

This PR makes the following modifications:

The functionality should be identical to before if the --db-dir argument is not used, but with slightly improved feedback to the user.

boulund commented 2 years ago

Hi,

This folder is where super-focus is installed by default, it (or its equivalent) should be present on all systems where super-focus was installed with conda or pip.

But I agree I was a bit sloppy when describing the location, the way it is written now can be misinterpreted. I suggest we perhaps drop the default location entirely from the help string instead.

Den tis 21 dec. 2021 17:50Geni Silva @.***> skrev:

@.**** requested changes on this pull request.

Thanks. I have left one minor request, and I'm more than happy to merge it once addressed. @boulund https://github.com/boulund

In superfocus_app/superfocus_downloadDB.py https://github.com/metageni/SUPER-FOCUS/pull/73#discussion_r773294335:

  • parser.add_argument("-i", "--input", help="Target input files to be formatted into the database", required=True)
  • parser.add_argument('-v', '--version', action='version', version='superfocus_downloadDB version {}'.format(
  • version))
  • parser = argparse.ArgumentParser(
  • description=doc,
  • epilog="superfocus_downloadDB -a diamond,rapsearch,blast -i clusters/")
  • parser.add_argument("-a", "--aligner", required=True,
  • help="Aligner name separed by ',' if more than one")
  • parser.add_argument("-c", "--clusters", default="90",
  • help="DB types separed by ',' if more than one (e.g 90,95,98,100). Default: 90")
  • parser.add_argument("-i", "--input", required=True,
  • help="Target input files to be formatted into the database")
  • parser.add_argument("-d", "--db-dir",
  • help="Alternate database directory to store DB files in. "
  • "Default: lib/python3.10/site-packages/superfocus_app/")

can we improve this? I doubt this folder (default) will be present for most people.

— Reply to this email directly, view it on GitHub https://github.com/metageni/SUPER-FOCUS/pull/73#pullrequestreview-837657410, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFRA2SW2TUQFUNLPJD2KA3USCV6NANCNFSM5KQMGBIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

metageni commented 2 years ago

Hi, This folder is where super-focus is installed by default, it (or its equivalent) should be present on all systems where super-focus was installed with conda or pip. But I agree I was a bit sloppy when describing the location, the way it is written now can be misinterpreted. I suggest we perhaps drop the default location entirely from the help string instead. Den tis 21 dec. 2021 17:50Geni Silva @.> skrev: @*.*** requested changes on this pull request. Thanks. I have left one minor request, and I'm more than happy to merge it once addressed. @boulund https://github.com/boulund ------------------------------ In superfocus_app/superfocus_downloadDB.py <#73 (comment)>: > - parser.add_argument("-i", "--input", help="Target input files to be formatted into the database", required=True) - parser.add_argument('-v', '--version', action='version', version='superfocus_downloadDB version {}'.format( - version)) + parser = argparse.ArgumentParser( + description=doc, + epilog="superfocus_downloadDB -a diamond,rapsearch,blast -i clusters/") + + parser.add_argument("-a", "--aligner", required=True, + help="Aligner name separed by ',' if more than one") + parser.add_argument("-c", "--clusters", default="90", + help="DB types separed by ',' if more than one (e.g 90,95,98,100). Default: 90") + parser.add_argument("-i", "--input", required=True, + help="Target input files to be formatted into the database") + parser.add_argument("-d", "--db-dir", + help="Alternate database directory to store DB files in. " + "Default: lib/python3.10/site-packages/superfocus_app/") can we improve this? I doubt this folder (default) will be present for most people. — Reply to this email directly, view it on GitHub <#73 (review)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFRA2SW2TUQFUNLPJD2KA3USCV6NANCNFSM5KQMGBIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.>

right. I'm away from my computer now. would you mind removing the default parameter (as I agree and you suggested) and add required=True? Thanks

Happy to approve it once the change is done. thanks.

boulund commented 2 years ago

I'd be happy to remove the potentially misleading default path hint in the help text.

However, I don't agree that we should add required=True to the --db-dir argument. That would force users to have to specify a folder, which can potentially break existing workflows where people rely on the previous behavior.

In your original implementation, the database directory is determined by finding the location of the script file, and I kept that as the default behavior if a user doesn't specify --db-dir. See these lines: https://github.com/metageni/SUPER-FOCUS/pull/73/files#diff-d7287f92688bfdb315f9baed991e60448bfb4c91a00f16b098570e8e53318c79R19

https://github.com/metageni/SUPER-FOCUS/pull/73/files#diff-d7287f92688bfdb315f9baed991e60448bfb4c91a00f16b098570e8e53318c79R46-R50

I will provide modifications when I'm in the office again tomorrow

metageni commented 2 years ago

I'd be happy to remove the potentially misleading default path hint in the help text.

However, I don't agree that we should add required=True to the --db-dir argument. That would force users to have to specify a folder, which can potentially break existing workflows where people rely on the previous behavior.

In your original implementation, the database directory is determined by finding the location of the script file, and I kept that as the default behavior if a user doesn't specify --db-dir. See these lines: https://github.com/metageni/SUPER-FOCUS/pull/73/files#diff-d7287f92688bfdb315f9baed991e60448bfb4c91a00f16b098570e8e53318c79R19

https://github.com/metageni/SUPER-FOCUS/pull/73/files#diff-d7287f92688bfdb315f9baed991e60448bfb4c91a00f16b098570e8e53318c79R46-R50

I will provide modifications when I'm in the office again tomorrow

my bad. you are completely correct; thanks

boulund commented 2 years ago

Going through some of the closed issues like https://github.com/metageni/SUPER-FOCUS/issues/67 and https://github.com/metageni/SUPER-FOCUS/issues/65 I notice they are caused by the same issue that made me create this PR: if installing SUPER-FOCUS via conda the db directory is not bundled properly (I made a separate issue about it: https://github.com/metageni/SUPER-FOCUS/issues/74)

boulund commented 2 years ago

This PR should be good to go now anyway, I've made the modifications we discussed yesterday.

boulund commented 2 years ago

However, reading through the code for the main SUPER-FOCUS script I noticed that the DB paths contains hardcoded elements, e.g. here https://github.com/metageni/SUPER-FOCUS/blob/ec533bede1dd71927cb1b6b0f5346c11801a63dd/superfocus_app/do_alignment.py#L88 If a user would run superfocus_downloadDB with the modifications in this PR it would not work properly, since the DB files created with a custom --db-dir does not follow the same convention.

I think we should wait before merging this PR.

boulund commented 2 years ago

I pushed two more commits that fixes the issue, so that even if a user provides a custom --db-dir the resulting folder structure will follow the same pattern as in the default case.

This enables the use of a conda-based installation relatively easily by following these steps, until we can get the bioconda recipe fixed:

  1. Create a conda environment with super-focus:
    $ conda create -n superfocus super-focus
  2. Pull the github repo to get the contents of the db folder that is missing in the conda install (for this example I'm putting the local copy of the repo in my home directory):
    $ git clone git@github.com:metageni/SUPER-FOCUS ~/SUPER-FOCUS
  3. Create a folder where you want to store the DB files, and put the db folder in that folder:
    $ mkdir -pv ~/superfocus_db
    $ cp -r ~/SUPER-FOCUS/superfocus_app/db ~/superfocus_db
  4. Download the SUPER-FOCUS fasta files with clusters:
    $ cd ~/superfocus_db
    $ wget edwards.sdsu.edu/superfocus/downloads/db.zip
    $ unzip db.zip
  5. Activate conda environment
    conda activate superfocus
  6. Run superfocus_downloadDB (the version from this PR) to create relevant mapper database files, e.g.:
    (superfocus)$ cd ~/superfocus_db
    (superfocus)$ superfocus_downloadDB -a diamond -i ~/superfocus_db/clusters/ --db-dir ~/superfocus_db/
  7. Run SUPER-FOCUS, e.g.:
    (superfocus)$ superfocus -q path/to/input/folder -dir path/to/output/folder --alternate_directory ~/superfocus_db -a diamond
metageni commented 2 years ago

LGTM. I will merge it and trigger the new conda version. It should be up in the next hours.

PaolaDiGianvito commented 2 months ago

Hello, I'm new with python. I'm running superfocus, but i'm not able to join the final .xls files. Can someone help me? I've downloaded the join_output.py file from ttps://github.com/linsalrob/EdwardsLab/blob/master/superfocus_all/join_output.py, but it doesn't work. (an't split: |Query: ['/home/pdigianv/ita_gre/no_grape_comb/flash/IC2_EF.extendedFrags.fastq']|). Can someone help me? thank you

metageni-twist commented 2 months ago

@linsalrob: you should this script, could you please help @PaolaDiGianvito ?

@PaolaDiGianvito: could you please share the command line you running with the script? are you using the parameters -d and -o

PaolaDiGianvito commented 2 months ago

Thank you, the script is: Python join_table.py -d /home/pdigianv/ita_gre/Superfocus/functions -o /home/pdigianv/ita_gre/Superfocus/functions/total_functions.xls

Il Ven 3 Mag 2024, 19:42 Genivaldo Silva @.***> ha scritto:

@linsalrob https://github.com/linsalrob: you should this script, could you please help @PaolaDiGianvito https://github.com/PaolaDiGianvito ?

@PaolaDiGianvito https://github.com/PaolaDiGianvito: could you please share the command line you running with the script? are you using the parameters -d and -o

— Reply to this email directly, view it on GitHub https://github.com/metageni/SUPER-FOCUS/pull/73#issuecomment-2093482483, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBFTCG43YQ6SDQXA4FFCGI3ZAPD6XAVCNFSM5KQMGBIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGM2DQMRUHAZQ . You are receiving this because you were mentioned.Message ID: @.***>

metageni-twist commented 1 month ago

and what is the error you getting?

PaolaDiGianvito commented 1 month ago

Can't split: Query: ['/home/pdigianv/ita_gre/no_grape_comb/flash/IC2_EF.extendedFrags.fastq']

Il Dom 5 Mag 2024, 05:51 Genivaldo Silva @.***> ha scritto:

and what is the error you getting?

— Reply to this email directly, view it on GitHub https://github.com/metageni/SUPER-FOCUS/pull/73#issuecomment-2094570727, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBFTCGZ62XKQAL67L72CWE3ZAWUEXAVCNFSM5KQMGBIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGQ2TOMBXGI3Q . You are receiving this because you were mentioned.Message ID: @.***>

metageni-twist commented 1 month ago

it is hard to tell what the problem is. if you compress and send me the output, I can take a look to you.

genivaldo.gueiros@gmail.com

PaolaDiGianvito commented 1 month ago

hello, this is the output....we have four output file, this is the most complex, in the other i have the first 4 rows withe information and after 1 column with the level and other 2 columns with reads o9f the sample and relative abundance, respectively. Thank you

Paola Di Gianvito, PhD Tecnologo della ricerca, DISAFA, University of Turin Agricultural Microbiology and Food Technology Sector

Corso Enotria 2/C, Ampelion 12051 Alba - Cuneo - ITALY

Il giorno lun 6 mag 2024 alle ore 22:04 Genivaldo Silva < @.***> ha scritto:

it is hard to tell what the problem is. if you compress and send me the output, I can take a look to you.

@.***

— Reply to this email directly, view it on GitHub https://github.com/metageni/SUPER-FOCUS/pull/73#issuecomment-2096811604, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBFTCG3HJORV3LWUIFJTYEDZA7O4RAVCNFSM5KQMGBIKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZGY4DCMJWGA2A . You are receiving this because you were mentioned.Message ID: @.***>