merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
423 stars 144 forks source link

Export PC alignments #494

Closed rbeinart closed 7 years ago

rbeinart commented 7 years ago

Hello, As suggested on the Google Group, I am posting here to request that a function be added to export PC alignments from the pangenomics analysis pipeline.

Thanks! -Roxanne-

meren commented 7 years ago

Hi Roxanne,

Thanks for this. Now anvi-summarize for pan genomes does report aligned sequences in PCs (unless the flag --skip-alignments with anvi-pan-genome, in which case it will simply report unaligned sequences). This will be in the next stable release after some more testing.

Now we have the functionality in the super class I will also implement an anvi-export-pc-alignments program so aligned sequences for individual PCs can be acquired rapidly without summary.

Best,

meren commented 7 years ago

This is done. We now have a new program to do it outside of summaries:

$ anvi-export-pc-alignments -h
usage: anvi-export-pc-alignments [-h] [-p PAN_DB] [-g GENOMES_STORAGE]
                                 [-o FILE_PATH] [--pc-id PROTEIN_CLUSTER_ID]
                                 [--pc-ids-file FILE_PATH]
                                 [-C COLLECTION_NAME] [-b BIN_NAME]
                                 [--list-collections] [--list-bins]

Export aligned sequences from anvi'o pan genomes

optional arguments:
  -h, --help            show this help message and exit

INPUT FILES:
  Input files from the pangenome analysis.

  -p PAN_DB, --pan-db PAN_DB
                        Anvi'o pan database
  -g GENOMES_STORAGE, --genomes-storage GENOMES_STORAGE
                        Anvi'o genomes storage file

OUTPUT FILE:
  You get to chose an output file name to report things. The default will be
  an ugly name. So, be explicit.

  -o FILE_PATH, --output-file FILE_PATH
                        File path to store results.

SELECTION:
  Which protein clusters should be exported. You can ask for a single PC, or
  multiple ones listed in a file, or you can use a collection and bin name
  to list PCs of interest. If you give nothing, this program will export
  alignments for every single PC found in the profile database (and this is
  called 'customer service').

  --pc-id PROTEIN_CLUSTER_ID
                        Protein cluster ID you are interested in.
  --pc-ids-file FILE_PATH
                        Text file for protein clusters (each line should
                        contain be a unique protein cluster id).
  -C COLLECTION_NAME, --collection-name COLLECTION_NAME
                        Collection name.
  -b BIN_NAME, --bin-id BIN_NAME
                        Bin name you are interested in.

OTHER STUFF:
  Yes. Stuff that are not like the ones above.

  --list-collections    Show available collections and exit.
  --list-bins           List available bins in a collection and exit.