hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
193 stars 59 forks source link

Representing unmatched PURPLE copy number #587

Closed Chunkz616 closed 3 months ago

Chunkz616 commented 3 months ago

PURPLE unmatched copy number

HI brilliant tool thank you. Apologies if this is a bit of a basic question, I am new to this area. I have run purple umatched and am happy with the COBALT and AMBER outputs for my tumour samples

I am using the latest version with these inputs:

Run PURPLE for current patient ID

java -jar $PURPLE_JAR \
  -tumor $TUMOR \
  -amber $AMBER_DIR \
  -cobalt $COBALT_DIR \
  -gc_profile $GC_PROFILE \
  -ref_genome_version $REF_GENOME_VERSION \
  -ref_genome $REF_GENOME \
  -ensembl_data_dir $ENSEMBL_DATA_DIR \
  -output_dir $OUTPUT_DIR \
  -circos $CIRCOS_BIN

You can see attached a visualiation of the outputs. For the final copy number outputs I can see overall it is correct but I am struggling to deal with the variation and noise. E..g segements with a baf count of 0. Is there something I could modify to improve this, or is there an accepted way of improving this?

Additionally I know the copy number call to be wrong for some genes - e.g. PAX5 being homozygously deleted, but it is called as diploid. Is this a known issue? I did not provide the GRIDDS unmatched output in the above example but that didn't help either.

PD59703a_customplots (3)

p-priestley commented 3 months ago

Could you tell me what version of PURPLE, AMBER and COBALT you ran please? Also, is your final plot just charting the purple.cnv.somatic.tsv? It would be helpful if you could also attach the input.png and circos.png outputs for PURPLE if you don't mind so I can see the segmentation.

Chunkz616 commented 3 months ago

Thank you for the quick response, apologies for not providing - I am using purple_v3.8.4.jar cobalt-1.14.1.jar - with these inputs below java -Xmx28G -jar $COBALT_JAR \ -tumor $TUMOR \ -tumor_bam $TUMOR_BAM \ -output_dir $OUTPUT_DIR \ -threads $THREADS \ -gc_profile $GC_PROFILE \ -tumor_only_diploid_bed $DIPLOID_BED amber-3.9.jar java -Xmx32G -cp $AMBER_JAR com.hartwig.hmftools.amber.AmberApplication \ -tumor $TUMOR \ -tumor_bam $TUMOR_BAM \ -output_dir $OUTPUT_DIR \ -threads $THREADS \ -loci $LOCI \ -ref_genome_version $REF_GENOME_VERSION

Final plot is the purple.cnv.gene.tsv I have attached both circos, but also put in the .cnv.somatic.tsv plot

image

PD59703a input_noGRIDDS (1)

PD59703a circos_noGRIDDS (2)

p-priestley commented 3 months ago

I am not sure I can fully solve your problems here, but some hints which may be helpful:

Chunkz616 commented 3 months ago

Many thanks again for your response - it was 65X and from a BM sample

Thank you yes I agree it seems to really stem from regions of noise from AMBER

p-priestley commented 3 months ago

This is hard for me to advise on without seeing the data. Perhaps you could look at the depth profile of some of the noisy points in the BAF output. If they are typically low depth BAF points then adjusting those settings may help