rnakato / SSP

Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile
GNU General Public License v2.0
4 stars 0 forks source link

Questions about parameterization of SSP for FCS calculations with *S. cerevisiae* ChIP-seq data #4

Closed kalavattam closed 4 months ago

kalavattam commented 4 months ago

Dear @rnakato,

Thank you again for developing and maintaining SSP. I have a question regarding the parameterization of SSP for FCS calculations with S. cerevisiae ChIP-seq data. In the SSP tutorial described in the README.md, it is advised to adjust the parameters for calculating the background Jaccard score $J(d_{bg})$ to use a range of 10 kb to 50 kb, averaging scores at steps of 500 bp, instead of the default 500 kb to 1 Mb range with 5 kb steps:

ssp -i ChIP1.bam -o ChIP --gt genometable.txt --ng_from 10000 --ng_to 50000 --ng_step 500


Should the start (--ng_from_fcs), stop (--ng_to_fcs), and step (--ng_step_fcs) values for FCS background calculations also be reduced for S. cerevisiae ChIP-seq data? (Is $\text{cPNF}(d{bg}, s) = \frac{f(d{bg}, \space s)}{N{d{bg}}}$, where $N{d{bg}} = |v_c^{fwd} \cap vc^{rev} (d{bg})|$?)

I tried running FCS calculations with default and reduced --ng_from_fcs, --ng_to_fcs, and --ng_step_fcs values:

dir_bam=...        # Directory containing BAM files
dir_tbl=...        # Directory containing genome and mappability tables
path_img=...       # Path to ssp_drompa.img
file_tbl_gen=...   # Genome table file
file_tbl_map=...   # Mappability table file
thresh=0           # Calculate PCR bias threshold with program's internal logic
SSP_ng_from=10000  # S. cerevisiae 10 kb (default for H. sapiens is 500 kb, a 50× reduction)
SSP_ng_to=50000    # S. cerevisiae 50 kb (default for H. sapiens is 1 Mb, a 20× reduction)
SSP_ng_step=500    # S. cerevisiae 500 bp (default for H. sapiens is 5 kb, a 10× reduction)
FCS_ng_from=2000   # S. cerevisiae 2 kb (a 50× reduction from H. sapiens 100000)
FCS_ng_to=50000    # S. cerevisiae 50 kb (a 20× reduction from H. sapiens 1000000)
FCS_ng_step=1000   # S. cerevisiae 1 kb (a 10× reduction from H. sapiens 100000)

#  Default values for FCS calculations
singularity exec \
    --no-home \
    --bind "${dir_bam}:/mnt_1" \
    --bind "${dir_tbl}:/mnt_2" \
    "${path_img}" \
        ssp \
            --threads ${SLURM_CPUS_ON_NODE} \
            --pair \
            --input "/mnt_1/${file_bam}" \
            --output "${file_bam%.bam}_filt-${thresh}" \
            --gt "/mnt_2/${file_tbl_gen}" \
            --mptable "/mnt_2/${file_tbl_map}" \
            --include_allchr \
            --ng_from ${SSP_ng_from} \
            --ng_to ${SSP_ng_to} \
            --ng_step ${SSP_ng_step} \
            --thre_pb ${thresh}

#  Reduced values for FCS calculations
singularity exec \
    --no-home \
    --bind "${dir_bam}:/mnt_1" \
    --bind "${dir_tbl}:/mnt_2" \
    "${path_img}" \
        ssp \
            --threads ${SLURM_CPUS_ON_NODE} \
            --pair \
            --input "/mnt_1/${file_bam}" \
            --output "${file_bam%.bam}_filt-${thresh}" \
            --gt "/mnt_2/${file_tbl_gen}" \
            --mptable "/mnt_2/${file_tbl_map}" \
            --include_allchr \
            --ng_from ${SSP_ng_from} \
            --ng_to ${SSP_ng_to} \
            --ng_step ${SSP_ng_step} \
            --ng_from_fcs ${FCS_ng_from} \
            --ng_to_fcs ${FCS_ng_to} \
            --ng_step_fcs ${FCS_ng_step} \
            --thre_pb ${thresh}


Despite using different --ng_from_fcs, --ng_to_fcs, and --ng_step_fcs values, the related plots look largely the same (see image_FCS-values_default-vs-reduced.png). The top plots use default FCS background calculation values, and the bottom plots use reduced values. Could you provide any advice on why this might be happening and how to proceed? Also, the plots for "Proportion of nearest neighbor fragments" and "Cumulative proportion" reach respective minimal and maximal plateaus quickly, and the curves do not change with len values. Is this expected? This behavior is different than what is shown in the supplementary material for Nakato, Shirahige, Sensitive and robust assessment of ChIP-seq read distribution using a strand-shift profile, Bioinformatics 2018-0307.

Thank you again for your help and guidance.

Best regards,
Kris

image_FCS-values_default-vs-reduced
rnakato commented 4 months ago

Hi Kris,

Since I have not extensively tested SSP on yeast samples, the results may differ slightly from those described in the paper.

kalavattam commented 4 months ago

@rnakato: Thank you for your response and guidance.

I was wondering if you could provide any specific advice on appropriate parameter values for --ng_from_fcs, --ng_to_fcs, and --ng_step_fcs. Would you change the values I initially tested?

FCS_ng_from=2000   # S. cerevisiae 2 kb (a 50× reduction from H. sapiens 100000)
FCS_ng_to=50000    # S. cerevisiae 50 kb (a 20× reduction from H. sapiens 1000000)
FCS_ng_step=1000   # S. cerevisiae 1 kb (a 10× reduction from H. sapiens 100000)

I am keen to use your tools effectively in my work, and any information you could share to help avoid a potentially time-consuming parameter scan would be greatly appreciated.

Thank you again for your support.

Best regards,
Kris

rnakato commented 4 months ago

Dear Kris,

Thank you for your extensive use of this tool. In the FCS, there are no strict criteria for choosing the background distance. As shown in your figure, the background level is quite similar at distances > 1k, so you can choose any distance. However, in cases where the samples contain broad mode peaks, a distance > 10kbp would be better, as used in the README. The important thing is to keep a consistent distance for all samples to be compared. Under the condition, SSP returns comparable values. For simplicity, it would be good to use the same distance with --ng_from, --ng_to, and --ng_step.

kalavattam commented 4 months ago

Thank you so much, @rnakato. This makes sense. I am going to close the issue; I may have another question or two related to this, but I will post them to a new GitHub Issue later. Thank you again.