z0on / GO_MWU

Rank-based Gene Ontology analysis of gene expression data
37 stars 17 forks source link

Command line and plot improvement #11

Open cche opened 2 years ago

cche commented 2 years ago

Improvements include:

All inputs and outputs are TAB separated.

All inputs are expected to be TAB separated and all outputs are written as TAB separated files. This avoids having files with different delimiters.

The results and best_GO outputs in the GO_MWU.R file are captured to files named results.tsv and best_GO.tsv respectively.

Changed naming so that data and results can be stored on another path.

With these changes to the naming of files you can put the file containing your genes of interest and the table of GO annotation of your genome in a separate directory and the results will be stored next to these files. This keeps the code directory clean.

Added command-line parsing

This allows the script to be launched as a command in the terminal. For example if your input data is stored in a subdirectory called data/ you can launch:

  $ Rscript GO_MWU.R -s ./ -i data/heats.csv -a data/amil_defog_iso2go.tab -g go.obo -d BP -o 0.1 -m 5 -c 0.25 -p 1e-2 -t 0.9

The -s option corresponds to the path to the GO_MWU directory and will be used to find the functions file as well as the perl scripts.

The script can be launched from any directory by giving the complete path to the script, -s option and the go.obo file. So if you are in a directory with your gene-to-GO annotation and your genes of interest files you can launch.

  $ Rscript /path/to/GO_MWU/GO_MWU.R -s /path/to/GO_MWU/ -g /path/to/go.obo -i heats.csv -a amil_defog_iso2go.tab -d BP -o 0.1 -m 5 -c 0.25 -p 1e-2 -t 0.9

Automatic label positioning in plot

Labels are correctly placed in front of the tree branches so that plots can be saved to file without intervention. No need to manually rescale the plot before saving.

cche commented 2 years ago

Dear Misha,

I forgot to add a context as to why I made these changes to your code.

We have started to use your code in our lab and most of the people use a local instance of Galaxy for their analyses. So I decided to make a Galaxy wrapper for your code. In order to do that I had to change some things like being able to launch it from the command line and keeping all data and results separate from the code (not really necessary but good to have in the command line). Also the plotting had to be done on one go without intervention of the user as it is not possible to do that on the galaxy server.

Hope you find these changes useful!

All the best, Cristian