nh13 / DWGSIM

Whole Genome Simulator for Next-Generation Sequencing
GNU General Public License v2.0
92 stars 36 forks source link

-o option is missing in the conda version 1.11 #58

Closed dangeles closed 2 years ago

dangeles commented 4 years ago

I downloaded the conda version of DWGSIM, but the software does not recognize (and in fact the documentation ported with the version does not provide a description of) the --output format option.

nh13 commented 3 years ago

@dangeles which tool are you talking about?

tmsincomb commented 3 years ago

@nh13, I believe @dangeles is referring to the dwgsim conda install and how It doesn't have the "-o" option so we don't have to create both the bfast and bwa fastq files. This option is referred in the wiki.

$ dwgsim -o 1 myfasta.fasta myread
dwgsim: invalid option -- 'o'
Unrecognized option: -?

The help message doesn't have the "-o" output option either

Program: dwgsim (short read simulator)
Version: 0.1.11
Contact: Nils Homer <dnaa-help@lists.sourceforge.net>

Usage:   dwgsim [options] <in.ref.fa> <out.prefix>

Options:
         -e FLOAT      per base/color/flow error rate of the first read [from 0.020 to 0.020 by 0.000]
         -E FLOAT      per base/color/flow error rate of the second read [from 0.020 to 0.020 by 0.000]
         -i            use the inner distance instead of the outer distance for pairs [False]
         -d INT        outer distance between the two ends for pairs [500]
         -s INT        standard deviation of the distance for pairs [50.000]
         -N INT        number of read pairs (-1 to disable) [-1]
         -C FLOAT      mean coverage across available positions (-1 to disable) [100.00]
         -1 INT        length of the first read [70]
         -2 INT        length of the second read [70]
         -r FLOAT      rate of mutations [0.0010]
         -F FLOAT      frequency of given mutation to simulate low fequency somatic mutations [0.5000]
                           NB: freqeuncy F refers to the first strand of mutation, therefore mutations 
                           on the second strand occour with a frequency of 1-F 
         -R FLOAT      fraction of mutations that are indels [0.10]
         -X FLOAT      probability an indel is extended [0.30]
         -I INT        the minimum length indel [1]
         -y FLOAT      probability of a random DNA read [0.05]
         -n INT        maximum number of Ns allowed in a given read [0]
         -c INT        generate reads for [0]:
                           0: Illumina
                           1: SOLiD
                           2: Ion Torrent
         -S INT        generate reads [0]:
                           0: default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent)
                           1: same strand (mate pair)
                           2: opposite strand (paired end)
         -f STRING     the flow order for Ion Torrent data [(null)]
         -B            use a per-base error rate for Ion Torrent data [False]
         -H            haploid mode [False]
         -z INT        random seed (-1 uses the current time) [-1]
         -M            generate a mutations file only [False]
         -m FILE       the mutations txt file to re-create [not using]
         -b FILE       the bed-like file set of candidate mutations [(null)]
         -v FILE       the vcf file set of candidate mutations (use pl tag for strand) [(null)]
         -x FILE       the bed of regions to cover [not using]
         -P STRING     a read prefix to prepend to each read name [not using]
         -q STRING     a fixed base quality to apply (single character) [not using]
         -Q FLOAT      standard deviation of the base quality scores [2.00]
         -s INT        standard deviation of the distance for pairs [50.000]
         -h            print this message

Note: For SOLiD mate pair reads and BFAST, the first read is F3 and the second is R3. For SOLiD mate pair reads
and BWA, the reads in the first file are R3 the reads annotated as the first read etc.

Note: The longest supported insertion is 4294967295.
gtollefson commented 2 years ago

I'd like to report that this is still the case for dwgsim built with Conda version 4.10.3 @nh13 . It would be nice to have the -o option to produce either BWA or bfast output only for storage limit purposes. I've pasted the usage output below:

(dwgsim_env) [gtollefs@login005 scripts]$ dwgsim -e 0.000 -E 0.000 -C 30 -1 150 -2 150 -r 0.00000 -F 0.000 -R 0.0 -q F -o 1 dwgsim: invalid option -- 'o' Unrecognized option: -?

Program: dwgsim (short read simulator) Version: 0.1.11 Contact: Nils Homer dnaa-help@lists.sourceforge.net

Usage: dwgsim [options]

Options: -e FLOAT per base/color/flow error rate of the first read [from 0.000 to 0.000 by 0.000] -E FLOAT per base/color/flow error rate of the second read [from 0.000 to 0.000 by 0.000] -i use the inner distance instead of the outer distance for pairs [False] -d INT outer distance between the two ends for pairs [500] -s INT standard deviation of the distance for pairs [50.000] -N INT number of read pairs (-1 to disable) [-1] -C FLOAT mean coverage across available positions (-1 to disable) [30.00] -1 INT length of the first read [150] -2 INT length of the second read [150] -r FLOAT rate of mutations [0.0000] -F FLOAT frequency of given mutation to simulate low fequency somatic mutations [0.0000] NB: freqeuncy F refers to the first strand of mutation, therefore mutations on the second strand occour with a frequency of 1-F -R FLOAT fraction of mutations that are indels [0.00] -X FLOAT probability an indel is extended [0.30] -I INT the minimum length indel [1] -y FLOAT probability of a random DNA read [0.05] -n INT maximum number of Ns allowed in a given read [0] -c INT generate reads for [0]: 0: Illumina 1: SOLiD 2: Ion Torrent -S INT generate reads [0]: 0: default (opposite strand for Illumina, same strand for SOLiD/Ion Torrent) 1: same strand (mate pair) 2: opposite strand (paired end) -f STRING the flow order for Ion Torrent data [(null)] -B use a per-base error rate for Ion Torrent data [False] -H haploid mode [False] -z INT random seed (-1 uses the current time) [-1] -M generate a mutations file only [False] -m FILE the mutations txt file to re-create [not using] -b FILE the bed-like file set of candidate mutations [(null)] -v FILE the vcf file set of candidate mutations (use pl tag for strand) [(null)] -x FILE the bed of regions to cover [not using] -P STRING a read prefix to prepend to each read name [not using] -q STRING a fixed base quality to apply (single character) [F] -Q FLOAT standard deviation of the base quality scores [0.00] -s INT standard deviation of the distance for pairs [50.000] -h print this message

Note: For SOLiD mate pair reads and BFAST, the first read is F3 and the second is R3. For SOLiD mate pair reads and BWA, the reads in the first file are R3 the reads annotated as the first read etc.

Note: The longest supported insertion is 4294967295.

nh13 commented 2 years ago

Cut a new release and made a PR into Bioconda: https://github.com/bioconda/bioconda-recipes/pull/32048.

nh13 commented 2 years ago

Also, not really spending too much time on this project, so not going to version the documentation :/