institut-de-genomique / MaGuS

MaGuS (Map-Guided Scaffolding) is a map-guided scaffolder and a reference-free assembly quality evaluator.
4 stars 1 forks source link

MaGuS

MaGuS (Map-GUided Scaffolding) is a scaffolder and a reference-free evaluator of assembly quality. It uses a draft genome assembly, a genome map, and high-throughput sequencing paired-end data. It has been succesfully tested on the Arabidopsis genome with Illumina reads and a Whole-Genome Profiling (WGP) map.

MaGuS run the five following steps :

- wgp2map: create a map genome file based on WGP data.
- map2qc: analyse the assembly quality based on colinearity between the assembly and the map.
- map2links: create links (map-links) between scaffolds .
- pairs2links: use NGS data to validate the map-links, orient the scaffolds and estimate the gaps size and create a '.de'.
- links2scaf: output the new final assembly in fasta format.

The use of MaGuS is not restricted to WGP data, other map types can be used. However they have to be formatted in the MaGuS map format (see below).

GroupID 1
Tag1 Tag2 Tag3 Tag4 Tag4 Tag5 Tag6 Tag7
rank1 rank2 rank3 rank4 rank5 rank6 rank7 
GroupID 2
...

MaGuS is distributed open-source under CeCILL FREE SOFTWARE LICENSE. Check out http://www.cecill.info/ for more information about the contents of this license.

MaGuS website http://www.genoscope.cns.fr/magus

Contact : magus [a] genoscope [.] cns [.] fr

PRE-REQUIREMENTS

DEPENDENCIES

INSTALLATION

  1. Download the .zip file MaGuS-master:
    wget https://github.com/institut-de-genomique/MaGuS/archive/master.zip
  2. Unzip it:
    unzip master.zip
  3. Download the example dataset available on the website http://www.genoscope.cns.fr/magus or:
    wget http://www.genoscope.cns.fr/magus/datasets/MaGuS/Arabido/Arabido_data.tar.gz
  4. Untar/unzip it:
    tar -zxvf Arabido_data.tar.gz
  5. Add MaGuS libraries in $PERL5LIB (i.e. PERL5LIB=$(pwd)/MaGuS-master/magus-1.0/lib/:$PERL5LIB)
  6. Add MaGuS binaries in $PATH (i.e. PATH=$(pwd)/MaGuS-master/magus-1.0/bin/:$PATH)
  7. Run MaGuS on the example data set and specify paths to SGA, R and samtools if they are not in the path :
$ magus all -w Arabido_data/tagsWgp.out -t Arabido_data/mapped_tag.bam -b Arabido_data/mp_map1_2.bam,5414,1000,76 -b Arabido_data/mp_map2_2.bam,5414,1000,76 -f Arabido_data/Arabido.fa -p Arabido -e 119667750 -sga /path/to/SGA/ -r /path/to/R/ -samtools /path/to/samtools/

RUNNING MaGuS

There are two ways to run MaGuS. The most common way is:

 magus all -w wgpFile -t tags.bam -f assembly.fa -e estimate_size -b file.bam,m,sd,s

The second way is to run MaGuS pipeline step by step as:

OPTIONS

Several mapped paired-end libraries (BAM file) can be used simultanously with the -b option for each one of them. Example:

magus pairs2links -f Arabidopsis.fa -l links_file.txt -b mapping_library1.bam,3500,600,101 -b mapping_library2.bam,6000,1000,151 -samtools /path/to/samtools/ -p Arabido

OUTPUT

-${prefix}_tags_coordinate.txt: File containing anchored tags on assembly sorted by mapping position

col 1: scaffold ID
col 2: position
col 3: tagId
col 4: rank
col 5: group ID

-${prefix}_anchored_assembly.txt: MaGuS format File of anchored scaffolds on the genome map

col 1: group ID
col 2: scaffold ID
col 3: minimum tag rank
col 4: maximum tag rank
col 5: number of tags

col 1: scafID1
col 2: scafID2
col 3: scafID1 length
col 4: scafID2 length
col 5: scafID1 orientation
col 6: scafID2 orientation
col 7: scafID1 mapping position
col 8: scafID2 mapping position
col 9: gap size

More informations


If you have questions about MaGuS, you may ask them to amadoui [at] genoscope [.] cns [.] fr, cdossat [at] genoscope [.] cns [.] fr, ldagata [at] genoscope [.] cns [.] fr and jmaury [at] genoscope [.] cns [.] fr . You may also create an issue to ask questions on github website: https://github.com/institut-de-genomique/MaGuS/issues.

ACKNOWLEDGMENTS

Carole Dossat, d'Agata Léo, Jean-Marc Aury and Mohammed-Amin Madoui - MaGuS's authors

This work was financially supported by the Genoscope, Institut de Genomique, CEA and Agence Nationale de la Recherche (ANR), and France Génomique (ANR-10-INBS-09-08).