pirovc / genome_updater

Bash script to download/update snapshots of files from NCBI genomes repository (refseq/genbank) with track of changes and without redundancy
MIT License
138 stars 14 forks source link

Unbound variable (line 253) #96

Open iwilkie opened 4 months ago

iwilkie commented 4 months ago

Hi,

I'm trying to run genome updater on a mac (macOS Monterey v12.6.3), I've run into some issues which were previously described and ran the troubleshooting steps (e.g. installing xargs and coreutils). I thought I got it running, but I ran into an issue that I haven't found described before:

bash-5.2$ ~/genome_updater.sh -T 'f__Akkermansiaceae' -M 'gtdb' -t '5' -o 'GTDB_Akkermansiaceae_May2024' -g 'bacteria' -f 'genomic.fna.gz' -d 'genbank,refseq'
-------------------------------------------
┌─┐┌─┐┌┐┌┌─┐┌┬┐┌─┐    ┬ ┬┌─┐┌┬┐┌─┐┌┬┐┌─┐┬─┐
│ ┬├┤ ││││ ││││├┤     │ │├─┘ ││├─┤ │ ├┤ ├┬┘
└─┘└─┘┘└┘└─┘┴ ┴└─┘────└─┘┴  ─┴┘┴ ┴ ┴ └─┘┴└─
                                     v0.6.3 
-------------------------------------------
Mode: NEW 
Args: -T 'f__Akkermansiaceae' -M 'gtdb' -t '5' -o 'GTDB_Akkermansiaceae_May2024' -g 'bacteria' -f 'genomic.fna.gz' -d 'genbank,refseq'
Outp: /Users/Isa/Downloads/GTDB_Akkermansiaceae_May2024/
-------------------------------------
Downloading assembly summary [2024-05-03_21-57-32]
 - Database [genbank,refseq]
 - Organism group [bacteria]
 -  assembly entries available

Filtering assembly summary [2024-05-03_21-57-32]
/Users/Isa/genome_updater.sh: line 253: 2: unbound variable

This is the line which is throwing the error:

bash-5.2$ sed '253q;d' ~/genome_updater.sh
    filtered_lines=${2}

No files were downloaded, and the log doesn't provide more details:

--- genome_updater version: 0.6.3 ---
Mode: NEW 
Args: -T 'f__Akkermansiaceae' -M 'gtdb' -t '5' -o 'GTDB_Akkermansiaceae_May2024' -g 'bacteria' -f 'genomic.fna.gz' -d 'genbank,refseq'
Outp: /Users/Isa/Downloads/GTDB_Akkermansiaceae_May2024/
-------------------------------------
Downloading assembly summary [2024-05-03_21-57-32]
 - Database [genbank,refseq]
 - Organism group [bacteria]
 -  assembly entries available

Filtering assembly summary [2024-05-03_21-57-32]

I'm wondering if this issue stems from me working from a mac. Have you encountered this issue before? If yes, is there any workaround or is it not possible because I'm trying to run from a mac?

Thanks, Isa

pirovc commented 1 month ago

genome_updater was not tested in mac environments and I don't have one to try it out. However, this seems like a normal script error and can be skipped removing the u from:

https://github.com/pirovc/genome_updater/blob/78c3fb546cdca726b333900f5319ab03e03681e4/genome_updater.sh#L2

eseiler commented 1 week ago

An exemplary trace would be something like

https://github.com/pirovc/genome_updater/blob/78c3fb546cdca726b333900f5319ab03e03681e4/genome_updater.sh#L253

https://github.com/pirovc/genome_updater/blob/78c3fb546cdca726b333900f5319ab03e03681e4/genome_updater.sh#L1308

https://github.com/pirovc/genome_updater/blob/78c3fb546cdca726b333900f5319ab03e03681e4/genome_updater.sh#L1292

https://github.com/pirovc/genome_updater/blob/78c3fb546cdca726b333900f5319ab03e03681e4/genome_updater.sh#L145-L148

Tools like sed and cut behave differently on macOS/OpenBSD.

So the unbound variable probably means that there is no return value from count_lines_file.

A possible workaround (untested) would be to use homebrew and install the GNU version of some tools:

brew install coreutils grep findutils gnu-sed

Because brew will prefix the tools with g to avoid conflicts (e.g., find -> gfind), you can also run

export PATH="$HOMEBREW_PREFIX/opt/coreutils/libexec/gnubin:$PATH"
export PATH="$HOMEBREW_PREFIX/opt/grep/libexec/gnubin:$PATH"
export PATH="$HOMEBREW_PREFIX/opt/findutils/libexec/gnubin:$PATH"
export PATH="$HOMEBREW_PREFIX/opt/gnu-sed/libexec/gnubin:$PATH"

in the shell you are going to use genome_updater.