greenelab / tribe

An open-source webserver that allows for easy, reproducible genomics analyses between different webservers
Other
3 stars 3 forks source link

Turn commands in 'load_ortanisms_and_genes_to_db.sh' script into for-loops, to make it shorter and more maintainable. #9

Open rzelayafavila opened 7 years ago

rzelayafavila commented 7 years ago

This was brought up in https://github.com/greenelab/tribe/pull/8, but was proving to be too involved for the scope of the pull request.

dongbohu commented 7 years ago

Instead of using the associative array in bash 4 in for loop, we can use a simpler array like this:

list=(
    9606 "Homo sapiens" "Human"
    4932 "Saccharomyces cerevisiae" "S. cerevisiae"
    10090 "Mus musculus" "Mouse",
    10116 "Rattus norvegicus" "Rat",
    6239 "Caenorhabditis elegans" "C. elegans"
    3702 "Arabidopsis thaliana" "Arabidopsis"
    7227 "Drosophila melanogaster" "Fruit fly"
    7955 "Danio rerio" "Zebrafish"
    208964 "Pseudomonas aeruginosa" "Pseudomonas aeruginosa"
)

count=0
while [ "x${list[count]}" != "x" ]; do
   id=${list[count]}
   sci_name=${list[count + 1]}
   common_name=${list[count + 2]}
   python manage.py organisms_create_or_update --taxonomy_id=$id \
        --scientific_name="$sci_name" --common_name="$common_name"
   count=$(( $count + 3 ))
done

This is an example to replace lines 9-26 of tribe/load_organisms_and_genes_to_db.sh. (May not be the best solution, but it is something worth trying.)

rzelayafavila commented 7 years ago

@dongbohu - Thanks, I agree. Probably the list will need to include the location of the data files for each organism and the systematic_col number if we want to replace lines 67-178. In terms of lines 28-62, it looks like declaring the array and running the loop might be more trouble than it's worth.

dongbohu commented 7 years ago

Yes, for lines 67-178, you will have to include other fields in the array too. For lines 28-62, the array will include both the name and url.

It also depends on whether you may need to add more python manage xxx commands or not. The more you need to add in, the larger benefit a for loop will give you.