luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

Arguments for data-profile #85

Closed 24natasya closed 4 years ago

24natasya commented 4 years ago

I just want to ask, what arguments to i specify for --data-profile . Does is requires a vcf files or a truth file?

dancooke commented 4 years ago

The argument is a (CSV) file path where Octopus will write the data profile to, e.g.:

$ octopus -R ref.fa -I reads.bam -o calls.vcf --data-profile reads.profile.csv

You then use one or more profiles to generate new error models:

$ profiler.py -P reads.profile.csv -O reads.model

Then you can use this error model for variant calling:

$ octopus -R ref.fa -I reads.bam -o calls.vcf --sequence-error-model reads.model

There is no truth set needs to fit the error model.

24natasya commented 4 years ago

Thanks for your prompt reply. I really appreciate!

On Wed, Oct 9, 2019 at 4:53 PM Daniel Cooke notifications@github.com wrote:

The argument is a (CSV) file path where Octopus will write the data profile to, e.g.:

$ octopus -R ref.fa -I reads.bam -o calls.vcf --data-profile reads.profile.csv

You then use one or more profiles to generate new error models https://github.com/luntergroup/octopus/wiki/How-to:-Use-error-models:

$ profiler.py -P reads.profile.csv -O reads.model

Then you can use this error model for variant calling:

$ octopus -R ref.fa -I reads.bam -o calls.vcf --sequence-error-model reads.model

There is no truth set needs to fit the error model.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/luntergroup/octopus/issues/85?email_source=notifications&email_token=AMYUHWK5VFO5VKHAG6MCMHDQNWL2FA5CNFSM4I62MDHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAXFACQ#issuecomment-539906058, or mute the thread https://github.com/notifications/unsubscribe-auth/AMYUHWMP56E7BETSVQIXZNDQNWL2FANCNFSM4I62MDHA .

brentp commented 1 year ago

@dancooke , one clarification on this, do the other arguments to octopus matter? E.g. if the data is from haploid clones, would the -C polyclone and other appropriate arguments matter?