ma-compbio / Weaver

Allele-Specific Quantification of Structural Variations in Cancer Genomes
MIT License
17 stars 7 forks source link

README clean-up requests #6

Closed evanbiederstedt closed 5 years ago

evanbiederstedt commented 7 years ago

Nothing too important. However, could you clean up the README such that users could click on the hyperlinks?

e.g. http://samtools.sourceforge.net/`_ gives an error currently.

Also, is it possible to link to the paper in the README?

Lastly, I'm still not entirely sure based on the examples given what Weaver PLOIDY and Weaver LITE actually do, or what the outputs when running these commands mean. A few more details would be helpful to the users of the software e.g.

Weaver PLOIDY -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 solo_ploidy TARGET 2
RUN MODE    PLOIDY
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
Estimated cancer haplotype coverage:    0
Estimated normal haplotype coverage:    0
Weaver LITE -f SIMU.fa -S FINAL_SV -s SNP -g REGION -w X.bam.wig -r 0 -m map100mer.bd -p 64 -t 20 -n 0
RUN MODE    LITE
THREAD was set to 64.
FASTA was set to SIMU.fa.
WIG was set to X.bam.wig.
MAP was set to map100mer.bd.
SV was set to FINAL_SV.
SNP was set to SNP.
GAP was set to REGION.
RUNFLAG was set to 0.
TUMOR coverage was set to 20.
NORMAL was set to 0.
Getting coverage profile...
Getting coverage profile done!
Getting GC content done!
Getting Mapability done!
base_mean = 20
best_norm = 0
LBP scan
LBP
LBP init
LBP print
LBP scan

Thanks!

evanbiederstedt commented 7 years ago

Also, I have a better understanding of which inputs are necessary and from the perl scripts

e.g. https://github.com/ma-compbio/Weaver/blob/master/bin/Weaver_pipeline.pl

GetOptions(
        #MANDATORY
        #OPTIONAL
        'p|thread=i'=>\$P,
        'g|gap=s'=>\$GAP, # with chr [MANDATORY]
    'b|bam=s'=>\$BAM, # [MANDATORY]
    'f|fa=s'=>\$FA, # no .fa [MANDATORY]
    'F|FullFa=s'=>\$FULLFA, # .fasta or .fa
        'h|help' =>\$help,
        'o|output=s'=>\$OUT_DIR,
    'k|onekg=s'=>\$ONEKG, # dir [MANDATORY]
    't=s'=>\$TorN,# Tumor or Normal
    's|sex=s'=>\$SEX, # M or F
        'C=i'=>\$cov);

However, I still don't understand why users would use the command Weaver PLOIDY versus Weaver LITE, or what each command is intended to do. Could this be explained a bit more?

Thanks

ashokrajaraman commented 7 years ago

Hi evanbiederstedt

I'll check with the original developer as to how best to clear this up in the README. For the moment, I can let you know that Weaver PLOIDY is expected to output the ploidy in the tumor and normal samples (i.e. the number (max) of copies of chromosomes), while Weaver LITE finds allele specific copy numbers of the variants called by the original perl script.

evanbiederstedt commented 7 years ago

Hi Ashok

Thanks for the response. I may need a slight clarification here:

the number (max) of copies of chromosomes

What do you say "(max)"? I'm not sure why the ploidy would be the max number of chromosomes. Shouldn't it be the mean number of complete sets of chromosomes?

Thanks for your help, Evan

CC @xtYao

ashokrajaraman commented 7 years ago

Hi Evan

Apologies, I mistyped that.

evanbiederstedt commented 7 years ago

Hi Ashok,

No problem. Just so there's no miscommunication: What does Weaver PLOIDY output?

Let's update this on the README for others to read ;)

ashokrajaraman commented 7 years ago

It outputs the estimated cancer haplotype coverage and the estimated normal haplotype coverage to stdout.

evanbiederstedt commented 7 years ago

Another README question:

Is solo_ploidy TARGET a required parameter? If not, what is the default parameter? 2?

Could there be more of a description here?

ashokrajaraman commented 7 years ago

At the moment, I think solo_ploidy is a redundant executable. We'll fill in the README as it is implemented or possibly removed.

evanbiederstedt commented 7 years ago

Thanks

Could you also clarify the -t and -n parameters for Weaver LITE? What do these mean?

ashokrajaraman commented 7 years ago

These would be the tumor and normal coverage in the sequencing/match respectively. I would suggest using bedtools to find the coverage beforehand from the BAM files.

evanbiederstedt commented 7 years ago

Thanks

And for Weaver PLOIDY, parameter -g

This is the uncharacterized regions of the ref genome, I believe (?). It's just an tsv/tab-delimited text file?

ashokrajaraman commented 7 years ago

That's correct. For real data, two such files are provided in the folder data/ under the name of GAP. The only difference is that the chromosome names in one are prefixed by chr, so you use the one which matches the format in the data you have.