raphael-group / hatchet

HATCHet (Holistic Allele-specific Tumor Copy-number Heterogeneity) is an algorithm that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples.
BSD 3-Clause "New" or "Revised" License
66 stars 31 forks source link

the step "phase_snps" error #170

Open DecodeGenome opened 1 year ago

DecodeGenome commented 1 year ago

Hi hatchet developers:

I ran " python3 -m hatchet run hatchet.ini", it created "snps" folder in "output", then the error occured below, how to fix it?

Thanks,

Wei

The step "phase_snps" requires that the config variable "download_panel.refpaneldir" indicates the directory where the reference panel is located.Traceback (most recent call last): File "/wynton/home/bivona/wwu888/anaconda3/envs/hatchet112/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/wynton/home/bivona/wwu888/anaconda3/envs/hatchet112/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/wynton/home/bivona/wwu888/anaconda3/envs/hatchet112/lib/python3.9/site-packages/hatchet/main.py", line 65, in sys.exit(main()) File "/wynton/home/bivona/wwu888/anaconda3/envs/hatchet112/lib/python3.9/site-packages/hatchet/main.py", line 61, in main
globals()command File "/wynton/home/bivona/wwu888/anaconda3/envs/hatchet112/lib/python3.9/site-packages/hatchet/utils/run.py", line 145, in main raise ValueError( ValueError: The step "phase_snps" requires that the config variable "download_panel.refpaneldir" indicates the directory where the reference panel is located.

brian-arnold commented 1 year ago

Hi Wei, Could you provide a copy of your hatchet.ini file here? At first glance it looks like the first step to download the reference panel may not have been done, or the path of the panel (located in the download_panel.refpaneldir variable) was not specified. Sincerely, Brian

DecodeGenome commented 1 year ago

Hi Brian,

I worked on HPC server, I realized that I don't have multi-user gurobi license.

Now I am trying to work on my macbook pro laptop (Monterey 12.6), I installed hatchet 1.1.2 and Gurobi952. After "hatchet check", there is a new issue now. I checkd mosdepth, it was installed under my "hatchet" environment. (

(hatchet) WilliamWus-MacBook-Pro:UCSF500_WGD_HATCHET cancerbio$ which mosdepth

/opt/anaconda3/envs/hatchet/bin/mosdepth

(hatchet) WilliamWus-MacBook-Pro:UCSF500_WGD_HATCHET cancerbio$ python -V

Python 3.10.6

) Then I added the "config.paths.mosdepth" in hatchet.ini (see attachment), it still had issue (below). Could please look into this issue?

The samples I am using are targeted exome sequencing, is there any specific setting for targeted exome sequencing in hatchet.ini? where to add panel bed filed in the hatchet.ini file?

I also tried to install hatchet with conda or manually on M1 max mac, none of them work. Do you have new version of hatchet that is compatible with M1 chip?

I desperately want to get hatchet algorithm working for our research projects, Please help me out.

Thanks,

Wei ###################################### mosdepth check FAILED. Please install mosdepth executable and either ensure its on your PATH, or its location specified in hatchet.ini as config.paths.mosdepth, or its location specified using the environment variable HATCHET_PATHS_MOSDEPTH ########################################

error after hatchet run hatchet.ini

[2022-Oct-31 23:08:37]# Writing the allele counts of tumor samples for selected SNPs

[2022-Oct-31 23:08:37]# Parsing and checking input arguments

The mosdepth executable was not found or is not executable. Please install mosdepth (e.g., conda install -c bioconda mosdepth) and/or supply the path to the executable.Traceback (most recent call last):

File "/opt/anaconda3/envs/hatchet/lib/python3.10/runpy.py", line 196, in _run_module_as_main

return _run_code(code, main_globals, None,

File "/opt/anaconda3/envs/hatchet/lib/python3.10/runpy.py", line 86, in _run_code

exec(code, run_globals)

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/main.py", line 65, in

sys.exit(main())

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/main.py", line 61, in main

globals()[command](args)

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/utils/run.py", line 202, in main

count_reads(

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/utils/count_reads.py", line 20, in main

args = parse_count_reads_args(args)

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/utils/ArgParsing.py", line 584, in parse_count_reads_args

ensure(

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/utils/Supporting.py", line 147, in ensure

return error(msg, raise_exception=True, exception_class=exception_class)

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/utils/Supporting.py", line 137, in error

return log(

File "/opt/anaconda3/envs/hatchet/lib/python3.10/site-packages/hatchet/utils/Supporting.py", line 124, in log

raise exception_class(msg)

ValueError: The mosdepth executable was not found or is not executable. Please install mosdepth (e.g., conda install -c bioconda mosdepth) and/or supply the path to the executable.

(hatchet) WilliamWus-MacBook-Pro:UCSF500_WGD_HATCHET cancerbio$


From: Brian J Arnold @.> Sent: Monday, October 31, 2022 6:12 AM To: raphael-group/hatchet @.> Cc: Wu, Wei @.>; Author @.> Subject: Re: [raphael-group/hatchet] the step "phase_snps" error (Issue #170)

Hi Wei, Could you provide a copy of your hatchet. ini file here? At first glance it looks like the first step to download the reference panel may not have been done, or the path of the panel (located in the download_panel. refpaneldir variable) ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi Wei, Could you provide a copy of your hatchet.ini file here? At first glance it looks like the first step to download the reference panel may not have been done, or the path of the panel (located in the download_panel.refpaneldir variable) was not specified. Sincerely, Brian

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/raphael-group/hatchet/issues/170*issuecomment-1297068925__;Iw!!LQC6Cpwp!oWqEGhHNVnFEqpQBEDssJjoU3lcxN8zQKMALMxi45Jza1kURDwMygP_a5kf3I4rGV03S5DUHE7_It9fgW0LdQqXk$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADZWCCDLLUCEDLCAMXMH4JDWF7AUJANCNFSM6AAAAAARORL4J4__;!!LQC6Cpwp!oWqEGhHNVnFEqpQBEDssJjoU3lcxN8zQKMALMxi45Jza1kURDwMygP_a5kf3I4rGV03S5DUHE7_It9fgW5bceooB$. You are receiving this because you authored the thread.Message ID: @.***>

[run]

What individual steps of HATCHet should we run in the pipeline?

Valid values are True or False

download_panel = False count_reads = True genotype_snps = True phase_snps = False fixed_width = False # True uses older fixed-width versions of some commands count_alleles = True combine_counts = True cluster_bins = True loc_clust = True # True uses new locality-aware clustering plot_bins = True compute_cn = True plot_cn = True

What chromosome(s) do we wish to process in the pipeline? Leave unspecified to process

all chromosomes found in the normal/tumor bam files

chromosomes =

Path to reference genome

Make sure you have also generated the reference dictionary as /path/to/reference.dict

reference = /Users/cancerbio/Desktop/Bioinformatics_tools/humanRefGenome/UCSC-hg19/UCSC-hg19.fa

Make sure you have generated the .bam.bai files at the same locations as these bam files

normal = /Users/cancerbio/Desktop/UCSF500_WGD_HATCHET/CGP-4959.deduplicated.realign.bam

Space-delimited list of tumor BAM locations

bams = /Users/cancerbio/Desktop/UCSF500_WGD_HATCHET/CGP-4960+CGP-4992.deduplicated.realign.bam

Space-delimited list of tumor names

samples = CGP4960

Output path of the run script

output = /Users/cancerbio/Desktop/UCSF500_HATCHET_RESULTS/wwu_1

How many cores to use for the end-end pipeline?

This parameter, if specified, will override corresponding 'processes' parameters in individual sections below.

processes = 6

[download_panel]

ref_panel = 1000GP_Phase3

refpaneldir = /Users/cancerbio/Desktop/UCSF500_WGD_HATCHET/1000gp3/1000GP_Phase3

download_panel.refpaneldir = /Users/cancerbio/Desktop/UCSF500_WGD_HATCHET/1000gp3/1000GP_Phase3/1000GP_phase3_WWU

[genotype_snps]

Reference version used to select list of known germline SNPs;

Possible values are "hg19" or "hg38", or leave blank "" if you wish for all positions to be genotyped by bcftools

reference_version = hg19

Does your reference name chromosomes with "chr" prefix?; True or False

chr_notation = True

Use 8 for WGS with >30x and 20 for WES with ~100x

mincov = 20

Use 300 for WGS with >30x and Use 1000 for WES with ~100x

maxcov = 1000

config.paths.mosdepth = /opt/anaconda3/envs/hatchet/bin/mosdepth

Path to SNP list

If unspecified, HATCHet selects a list of known germline SNPs based on and

If not, please provide full path to a locally stored list (.vcf.gz) here.

snps =

[combine_counts]

Minimum number of SNP-covering reads per bin and sample

msr = 5000

Minimum number of total reads per bin and sample

mtr = 5000

[cluster_bins_loc] diploidbaf = 0.08

Minimum and maximum number of clusters to infer

(using silhouette score for model selection)

minK = 2 maxK = 30

You can instead specify an exact number of clusters:

exactK = 15

[plot_bins] sizethreshold = 0.01 figsize = "6,3"

[compute_cn]

clones = 2,6 seeds = 400 minprop = 0.03 diploidcmax = 6 tetraploidcmax = 12 ghostprop = 0.35 limitinc = 0.6

vineetbansal commented 1 year ago

@DecodeGenome - the hatchet check command (when it comes to checking mosdepth) is essentially just trying to run mosdepth --version and making sure that the return code is 0. Can you confirm if this is indeed the case? I just installed mosdepth from the bioconda channel (version 0.3.3) and it does work (as does hatchet check for mosdepth).