there is no MAPSpeak result for test file

shangguandong1996 commented 2 years ago

Description of the bug

Hi, Dr Ou. I am using your hicar pipeline. I download the test data, but I do not find the MAPSpeak dir in result file. I am wondering whether you can give me some advice.

# Here is the download test input file
total 98M
-rw-rw-r-- 1 sgd sgd  11M Jul  2 19:37 chr22.fa.gz
-rw-rw-r-- 1 sgd sgd 854K Jul  2 19:37 chr22.gtf.gz
-rwxrwxr-x 1 sgd sgd  171 Jul  2 19:44 samplesheet.csv
-rw-rw-r-- 1 sgd sgd  15M Jul  2 19:37 wgEncodeCrgMapabilityAlign50mer.chr22.bigWig
-rw-rw-r-- 1 sgd sgd  33M Jul  2 19:36 WT_rep1_R1.fastq.gz
-rw-rw-r-- 1 sgd sgd  39M Jul  2 19:37 WT_rep1_R2.fastq.gz

Command used and terminal output

# command
nextflow run /data5/sgd_data/biosoft/nf-core/nf-core-hicar-dev/workflow --outdir result --input rawdata/samplesheet.csv -profile singularity --juicer_tools_jar /data5/sgd_data/biosoft/juicer_tools_2.13.06.jar --merge_map_py_source /data5/sgd_data/biosoft/ijuric/merge_map.py --feature_frag2bin_source /data5/sgd_data/biosoft/ijuric/feature_frag2bin.py --make_maps_runfile_source /data5/sgd_data/biosoft/ijuric/make_maps_runfile.py --fasta rawdata/chr22.fa.gz --gtf rawdata/chr22.gtf.gz --mappability rawdata/wgEncodeCrgMapabilityAlign50mer.chr22.bigWig

# output
-[nf-core/hicar] Pipeline completed successfully-
Completed at: 03-Jul-2022 10:56:26
Duration    : 3m 5s
CPU hours   : 0.9
Succeeded   : 92

# result
$ ll
total 61K
drwxrwxr-x 6 sgd sgd 6 Jul  3  2022 ATACpeak
drwxrwxr-x 3 sgd sgd 3 Jul  3  2022 bwa
drwxrwxr-x 2 sgd sgd 3 Jul  3  2022 checksums
drwxrwxr-x 5 sgd sgd 5 Jul  3  2022 cooler
drwxrwxr-x 2 sgd sgd 6 Jul  3  2022 fastqc
drwxrwxr-x 4 sgd sgd 6 Jul  3  2022 genome
drwxrwxr-x 2 sgd sgd 5 Jul  3  2022 igv.js
drwxrwxr-x 4 sgd sgd 5 Jul  3  2022 multiqc
drwxrwxr-x 4 sgd sgd 4 Jul  3  2022 pairs
drwxrwxr-x 2 sgd sgd 8 Jul  3  2022 pipeline_info

Relevant files

No response

System information

No response

jianhong commented 2 years ago

Thank you for testing the pipeline. Could you also share the log files with lines of MAPS?

Best!

Your sincerely,

Jianhong Ou

On Jul 2, 2022, at 11:05 PM, Shawn_Shang @.***> wrote:

Description of the bug

Hi, Dr Ou. I am using your hicar pipeline. I download the test data, but I do not find the MAPSpeak dir in result file. I am wondering whether you can give me some advice.

Here is the download test input file

total 98M -rw-rw-r-- 1 sgd sgd 11M Jul 2 19:37 chr22.fa.gz -rw-rw-r-- 1 sgd sgd 854K Jul 2 19:37 chr22.gtf.gz -rwxrwxr-x 1 sgd sgd 171 Jul 2 19:44 samplesheet.csv -rw-rw-r-- 1 sgd sgd 15M Jul 2 19:37 wgEncodeCrgMapabilityAlign50mer.chr22.bigWig -rw-rw-r-- 1 sgd sgd 33M Jul 2 19:36 WT_rep1_R1.fastq.gz -rw-rw-r-- 1 sgd sgd 39M Jul 2 19:37 WT_rep1_R2.fastq.gz

Command used and terminal output

command

nextflow run /data5/sgd_data/biosoft/nf-core/nf-core-hicar-dev/workflow --outdir result --input rawdata/samplesheet.csv -profile singularity --juicer_tools_jar /data5/sgd_data/biosoft/juicer_tools_2.13.06.jar --merge_map_py_source /data5/sgd_data/biosoft/ijuric/merge_map.py --feature_frag2bin_source /data5/sgd_data/biosoft/ijuric/feature_frag2bin.py --make_maps_runfile_source /data5/sgd_data/biosoft/ijuric/make_maps_runfile.py --fasta rawdata/chr22.fa.gz --gtf rawdata/chr22.gtf.gz --mappability rawdata/wgEncodeCrgMapabilityAlign50mer.chr22.bigWig

output

-[nf-core/hicar] Pipeline completed successfully- Completed at: 03-Jul-2022 10:56:26 Duration : 3m 5s CPU hours : 0.9 Succeeded : 92

result

$ ll total 61K drwxrwxr-x 6 sgd sgd 6 Jul 3 2022 ATACpeak drwxrwxr-x 3 sgd sgd 3 Jul 3 2022 bwa drwxrwxr-x 2 sgd sgd 3 Jul 3 2022 checksums drwxrwxr-x 5 sgd sgd 5 Jul 3 2022 cooler drwxrwxr-x 2 sgd sgd 6 Jul 3 2022 fastqc drwxrwxr-x 4 sgd sgd 6 Jul 3 2022 genome drwxrwxr-x 2 sgd sgd 5 Jul 3 2022 igv.js drwxrwxr-x 4 sgd sgd 5 Jul 3 2022 multiqc drwxrwxr-x 4 sgd sgd 4 Jul 3 2022 pairs drwxrwxr-x 2 sgd sgd 8 Jul 3 2022 pipeline_info Relevant files

No response

System information

No response

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

shangguandong1996 commented 2 years ago

log_file.txt

Thanks for your reply :). I do not have a dir called MAPSpeak. I cp the .nextflow.log. I hope it can helps.

jianhong commented 2 years ago

@shangguandong1996 According the log file, here is the error: Process NFCORE_HICAR:HICAR:MAPS_PEAK:MAPS_CALLPEAK > Skipping output binding because one or more optional files are missing: fileoutparam<0:2>

Could you please try a different version of the pipeline by pulling nf-core/hicar or jianhong/hicar? I want to make sure this is repeatable issue for singularity.

Thank you for your help.

Jianhong.

shangguandong1996 commented 2 years ago

Hi, Dr Ou. I download the another version.

$ nf-core download jianhong/hicar

                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 2.4.1 - https://nf-co.re

? Select release / branch: dev  [branch]

In addition to the pipeline code, this tool can download software containers.
? Download software container images: singularity

If transferring the downloaded files to another system, it can be convenient to have everything compressed in a single file.
This is not recommended when downloading Singularity images, as it can take a long time and saves very little space.
? Choose compression type: none
INFO     Saving 'jianhong/hicar'                                                                                                              download.py:160
          Pipeline revision: 'dev'                                                                                                                           
          Pull containers: 'singularity'                                                                                                                     
          Using $NXF_SINGULARITY_CACHEDIR': /data5/sgd_data/biosoft                                                                                          
          Output directory: 'jianhong-hicar-dev'                                                                                                             
INFO     Downloading workflow files from GitHub                                                                                               download.py:163
INFO     Downloading centralised configs from GitHub                                                                                          download.py:167
INFO     Found 42 containers                                                                                                                  download.py:493
Downloading singularity images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 42/42 completed

And there is still not MAPSpeak file

$ ll
total 61K
drwxrwxr-x 6 sgd sgd 6 Jul  5  2022 ATACpeak
drwxrwxr-x 3 sgd sgd 3 Jul  5  2022 bwa
drwxrwxr-x 2 sgd sgd 3 Jul  5  2022 checksums
drwxrwxr-x 5 sgd sgd 5 Jul  5  2022 cooler
drwxrwxr-x 2 sgd sgd 6 Jul  5  2022 fastqc
drwxrwxr-x 4 sgd sgd 6 Jul  5  2022 genome
drwxrwxr-x 2 sgd sgd 5 Jul  5  2022 igv.js
drwxrwxr-x 4 sgd sgd 5 Jul  5  2022 multiqc
drwxrwxr-x 4 sgd sgd 4 Jul  5  2022 pairs
drwxrwxr-x 2 sgd sgd 8 Jul  5  2022 pipeline_info

Here is the log file and some ijuric py I used(because my network is bad so I download all related py file locally) ijuric.zip

jianhong commented 2 years ago

Could you try to add one more parameters --maps_cutoff_counts 6 and re-run the your test? For the test data, I prepared limited reads which could not reach the coverage of default 12.

Or you can create a copy of https://raw.githubusercontent.com/jianhong/nf-core-hicar/master/conf/test.config and add the python source into the config file and then run it via -c parameter like: nextflow run jianhong/hicar -c the.config.file.name ...

Jianhong.

shangguandong1996 commented 2 years ago

Thanks, it works :) But it also produce a new error, which may be related my network and circos.

nextflow run /data5/sgd_data/biosoft/nf-core/nf-core-hicar-dev/workflow --outdir result --input rawdata/samplesheet.csv -profile singularity --juicer_tools_jar /data5/sgd_data/biosoft/juicer_tools_2.13.06.jar --merge_map_py_source /data5/sgd_data/biosoft/ijuric/merge_map.py --feature_frag2bin_source /data5/sgd_data/biosoft/ijuric/feature_frag2bin.py --make_maps_runfile_source /data5/sgd_data/biosoft/ijuric/make_maps_runfile.py --fasta rawdata/chr22.fa.gz --gtf rawdata/chr22.gtf.gz --mappability rawdata/wgEncodeCrgMapabilityAlign50mer.chr22.bigWig --maps_cutoff_counts 6

-[nf-core/hicar] Pipeline completed with errors-
[61/a638fc] NOTE: Process `NFCORE_HICAR:HICAR:MAPS_CIRCOS:CIRCOS_PREPARE (MAPS_PEAK_WT)` terminated with an error exit status (1) -- Error is ignored
Error executing process > 'NFCORE_HICAR:HICAR:PAIRTOOLS_PAIRE:PAIRSPLOT (WT_REP1)'

Caused by:
  Process `NFCORE_HICAR:HICAR:PAIRTOOLS_PAIRE:PAIRSPLOT (WT_REP1)` terminated with an error exit status (1)

Command executed:

  mv pairsqc_report WT_REP1_report
  pairsqcplot.r 4 WT_REP1_report

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_HICAR:HICAR:PAIRTOOLS_PAIRE:PAIRSPLOT":
      pairsqc: "0.2.2"
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error in file(filename, "r", encoding = encoding) : 
    cannot open the connection to 'https://raw.githubusercontent.com/SooLee/d3forNozzleR/master/interactive_multiline_d3prep.r'
  Calls: source -> file
  In addition: Warning message:
  In file(filename, "r", encoding = encoding) :
    URL 'https://raw.githubusercontent.com/SooLee/d3forNozzleR/master/interactive_multiline_d3prep.r': Timeout of 60 seconds was reached
  Execution halted

Work dir:
  /data5/sgd_data/newProject/202207/HiCAR_test_202207/work/47/01cdbb2f17347e305a1b52000834e6

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Here is the log file. I am wondering whether you can add a parameter that I can use interactive_multiline_d3prep.r locally. log_file.txt

shangguandong1996 commented 2 years ago

Hi, Dr Ou. I also have a Arabidopsis thaliana HiCAR library. I sequence 20M reads R1 and R2. My command is

nextflow run /data5/sgd_data/biosoft/nf-core/jianhong-hicar-dev/workflow --input rawdata/samplesheet.csv --outdir result --fasta ~/reference/genome/TAIR10/Athaliana_with_chr.fa --gtf ~/reference/annoation/Athaliana/Araport11/Araport11_GFF3_genes_transposons_with_chr.201606.gtf --macs_gsize 1.1e8 -profile singularity --juicer_tools_jar /data5/sgd_data/biosoft/juicer_tools_2.13.06.jar --merge_map_py_source /data5/sgd_data/biosoft/ijuric/merge_map.py --feature_frag2bin_source /data5/sgd_data/biosoft/ijuric/feature_frag2bin.py --make_maps_runfile_source /data5/sgd_data/biosoft/ijuric/make_maps_runfile.py --maps_cutoff_counts 6

But there is still not MAPS peak file. Is it because my sequence depth is not enough or I should modify some parameter that it pipeline can suit the small genome data. Or just my experiment fails.

$ ll
total 61K
drwxrwxr-x 6 sgd sgd 6 Jul  6 17:09 ATACpeak
drwxrwxr-x 3 sgd sgd 3 Jul  6 16:40 bwa
drwxrwxr-x 2 sgd sgd 3 Jul  6 16:23 checksums
drwxrwxr-x 5 sgd sgd 5 Jul  6 17:22 cooler
drwxrwxr-x 2 sgd sgd 6 Jul  6 16:21 fastqc
drwxrwxr-x 5 sgd sgd 7 Jul  6 16:18 genome
drwxrwxr-x 2 sgd sgd 5 Jul  6 17:22 igv.js
drwxrwxr-x 4 sgd sgd 5 Jul  6 17:23 multiqc
drwxrwxr-x 4 sgd sgd 4 Jul  6 17:21 pairs
drwxrwxr-x 2 sgd sgd 8 Jul  6 17:23 pipeline_info

Here is my log file log_file_At.txt

By the way, for this task. It complete sucessfully. And if you want the rawdata to check, I'm happy to share :)

Best wishes Guandong Shang

jianhong commented 2 years ago

Hi Shawn,

The quality control will be listed in the multiQC folder. Did you get that file?

From: Shawn_Shang @.> Date: Wednesday, July 6, 2022 at 10:03 PM To: jianhong/nf-core-hicar @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

Hi, Dr Ou. I also have a Arabidopsis thaliana HiCAR library. I sequence 20M reads R1 and R2. My command is

nextflow run /data5/sgd_data/biosoft/nf-core/jianhong-hicar-dev/workflow --input rawdata/samplesheet.csv --outdir result --fasta ~/reference/genome/TAIR10/Athaliana_with_chr.fa --gtf ~/reference/annoation/Athaliana/Araport11/Araport11_GFF3_genes_transposons_with_chr.201606.gtf --macs_gsize 1.1e8 -profile singularity --juicer_tools_jar /data5/sgd_data/biosoft/juicer_tools_2.13.06.jar --merge_map_py_source /data5/sgd_data/biosoft/ijuric/merge_map.py --feature_frag2bin_source /data5/sgd_data/biosoft/ijuric/feature_frag2bin.py --make_maps_runfile_source /data5/sgd_data/biosoft/ijuric/make_maps_runfile.py --maps_cutoff_counts 6

But there is still not MAPS peak file. Is it because my sequence depth is not enough or I should modify some parameter that it pipeline can suit the small genome data. Or just my experiment fails.

$ ll

total 61K

drwxrwxr-x 6 sgd sgd 6 Jul 6 17:09 ATACpeak

drwxrwxr-x 3 sgd sgd 3 Jul 6 16:40 bwa

drwxrwxr-x 2 sgd sgd 3 Jul 6 16:23 checksums

drwxrwxr-x 5 sgd sgd 5 Jul 6 17:22 cooler

drwxrwxr-x 2 sgd sgd 6 Jul 6 16:21 fastqc

drwxrwxr-x 5 sgd sgd 7 Jul 6 16:18 genome

drwxrwxr-x 2 sgd sgd 5 Jul 6 17:22 igv.js

drwxrwxr-x 4 sgd sgd 5 Jul 6 17:23 multiqc

drwxrwxr-x 4 sgd sgd 4 Jul 6 17:21 pairs

drwxrwxr-x 2 sgd sgd 8 Jul 6 17:23 pipeline_info

Here is my log file log_file_At.txthttps://github.com/jianhong/nf-core-hicar/files/9059732/log_file_At.txt

By the way, for this task. It complete sucessfully. And if you want the rawdata to check, I'm happy to share :)

Best wishes Guandong Shang

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1176958021, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA46XRETUJMNFRMLWTDVSY3FPANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

shangguandong1996 commented 2 years ago

Yes, I get the multiQC file. Please forgive me I am not familiar with HiC library so that I may not fully understand this QC. But it looks like a not very bad library ? multiqc.zip

jianhong commented 2 years ago

The QC show it very clear, the quality of the data is great. This conclusion is mainly based on the duplication_rate for Pairs and Cis/Trans ratios. Both located in the idea range. I checked the cuadapt qc for the filtered reads, looks like most of reads were kept. There is no issue with trim step. The only concern is the sequence depth. The Arabidopsis thaliana genome size is about 135M. 20M/135M is about 0.15. The number is OK. You can try to sequence it again to reach the ~0.3. The MAPS default calling is used for human/mouse data with very high coverage. Indeed it may very sensitive with the coverage. Before you do deep sequence again, please make sure the R2 reads behavior good. You can check the R2 reads coverage in promoter region by drag the R2 bigwig file in to IGV. You should see clear nucleosome avoidance shape in the promoter regions. You can also try to loose the depth filter condition such as decreasing the counts number from 12 to a lower number. You can also try other loop calling tools (sorry I did not include those tools in the pipeline).

One more thing, You may want to drop sim3d_rep1_T1.

From: Shawn_Shang @.> Date: Thursday, July 7, 2022 at 8:31 AM To: jianhong/nf-core-hicar @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

Yes, I get the multiQC file. Please forgive me I am not familiar with HiC library so that I may not fully understand this QC. But it looks like a not very bad library ? multiqc.ziphttps://github.com/jianhong/nf-core-hicar/files/9063514/multiqc.zip

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1177536647, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEAYCEKNG3H3HLPJT4HTVS3E3NANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

shangguandong1996 commented 2 years ago

Hi, Dr Ou. I am confused about the drop sim3d_rep1_T1. I only have one sample.

sgd@localhost ~/newProject/202207/HiCAR_SIM3d_202207/rawdata
$ ll
total 5.8G
-rw-rw-r-- 1 sgd sgd 2.8G Jun 30 16:33 HiCAR_SIM3d_1.fq.gz
-rw-rw-r-- 1 sgd sgd 3.1G Jun 30 16:33 HiCAR_SIM3d_2.fq.gz
-rw-rw-r-- 1 sgd sgd  174 Jul  2 19:48 samplesheet.csv

sgd@localhost ~/newProject/202207/HiCAR_SIM3d_202207/rawdata
$ less samplesheet.csv 

sgd@localhost ~/newProject/202207/HiCAR_SIM3d_202207/rawdata
$ cat samplesheet.csv 
group,replicate,fastq_1,fastq_2,md5_1,md5_2
SIM3d,1,rawdata/HiCAR_SIM3d_1.fq.gz,rawdata/HiCAR_SIM3d_2.fq.gz,beaef4c37e3a32f8939bd8181937034d,4d9c4446bf17db2c3be6f3f714adbc0c

shangguandong1996 commented 2 years ago

By the way, the R2 bigwig coverage is wired in IGV. Is it something wrong I did?

jianhong commented 2 years ago

Sorry, my fault. It should be the R1 duplicates rate is a little higher than what I expected. The range usually should be 40-75%. Over 90% indicates there are some issues with the R1 end preparing. The complexity of R1 maybe a concern.

From: Shawn_Shang @.> Date: Thursday, July 7, 2022 at 9:14 AM To: jianhong/nf-core-hicar @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

Hi, Dr Ou. I am confused about the drop sim3d_rep1_T1. I only have one sample.

@.*** ~/newProject/202207/HiCAR_SIM3d_202207/rawdata

$ ll

total 5.8G

-rw-rw-r-- 1 sgd sgd 2.8G Jun 30 16:33 HiCAR_SIM3d_1.fq.gz

-rw-rw-r-- 1 sgd sgd 3.1G Jun 30 16:33 HiCAR_SIM3d_2.fq.gz

-rw-rw-r-- 1 sgd sgd 174 Jul 2 19:48 samplesheet.csv

@.*** ~/newProject/202207/HiCAR_SIM3d_202207/rawdata

$ less samplesheet.csv

@.*** ~/newProject/202207/HiCAR_SIM3d_202207/rawdata

$ cat samplesheet.csv

group,replicate,fastq_1,fastq_2,md5_1,md5_2

SIM3d,1,rawdata/HiCAR_SIM3d_1.fq.gz,rawdata/HiCAR_SIM3d_2.fq.gz,beaef4c37e3a32f8939bd8181937034d,4d9c4446bf17db2c3be6f3f714adbc0c

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1177595254, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA4SL676PLQSASME7BTVS3J4DANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

jianhong commented 2 years ago

Could you please set auto-scale for the tracks? And please also show the R2 bigwigs.

From: Shawn_Shang @.> Date: Thursday, July 7, 2022 at 9:30 AM To: jianhong/nf-core-hicar @.> Cc: jianhong ou @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

By the way, the R2 bigwig coverage is wired in IGV. Is it something wrong I did? [??]https://user-images.githubusercontent.com/22555126/177785235-7d816392-ec86-43ca-970e-a6f1ab876e63.png

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1177617370, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA7PSMAPQNTW565SQBLVS3LVZANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

shangguandong1996 commented 2 years ago

Because At genome is small and I think the true enhancer-promoter link number is small. So when I PCR the library I use 15 cycle compared with 12 in inital HiCAR protocol. But the final concentraion is still low…… I believe it may related with the R1 duplicated rate. By the way, I am not sure which is R2 bigwig file. I just load both two bigwig in IGV

sgd@localhost ~/newProject/202207/HiCAR_SIM3d_202207/result/ATACpeak/R2_bigwig
$ tree
.
├── byGroup
│?? └── SIM3d.bigWig
└── Tn5InsSitesBySample
    └── SIM3d_REP1.bigWig

jianhong commented 2 years ago

SIM3d.bigWig is the one. Please zoom in to see the details. Attached please find the idea shape of R2 coverage in the promoter region. You can see the avoidance of nucleosome, I mean the periodicity of approximately 200bp.

igv_snapshot

shangguandong1996 commented 2 years ago

Hi, Dr Ou. I am confused about the the phenomenon you mention. Is it a ATAC-seq phenomenon or just HiCAR phenomenon?

You can see the avoidance of nucleosome, I mean the periodicity of approximately 200bp.

I think the picture below maybe the phenomenon you mentioned ?

shangguandong1996 commented 2 years ago

By the way, I am wondering whether I should change some parameter like expected chromatin interactions length so that this pipeline can be applied into At data. Becuase the Human/mouse genome is so big that the interaction range will big while the At genome is small.

jianhong commented 2 years ago

Yes, the picture you mentioned is the avoidance of nucleosome. This phenomenon is for both ATAC-seq and HiCAR.

From: Shawn_Shang @.> Date: Thursday, July 7, 2022 at 9:26 PM To: jianhong/nf-core-hicar @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

Hi, Dr Ou. I am confused about the the phenomenon you mention. Is it a ATAC-seq phenomenon or just HiCAR phenomenon?

You can see the avoidance of nucleosome, I mean the periodicity of approximately 200bp.

I think the picture below maybe the phenomenon you mentioned ? [image]https://user-images.githubusercontent.com/22555126/177897368-d16116af-5589-4c51-b1d8-b663e7f59afe.png

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1178434081, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA4IFVAOMGZVPRV3MKTVS57UZANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

jianhong commented 2 years ago

Definitely you can try to play with the parameters.

From: Shawn_Shang @.> Date: Thursday, July 7, 2022 at 9:35 PM To: jianhong/nf-core-hicar @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

By the way, I am wondering whether I should change some parameter like expected chromatin interactions length so that this pipeline can be applied into At data. Becuase the Human/mouse genome is so big that the interaction range will big while the At genome is small.

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1178439218, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA3G6LPHC4JQ4A5G2BTVS6AWNANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

shangguandong1996 commented 2 years ago

Thanks, Dr Ou. I am wondering whether you can give me some advice about the fine-tuing paramter. I notcied that the major interaction call tools is MAPS. And I find there is only one parameter --cool_bin resolution bin size may be related with the genome size and expected the total number of interaction. And other paramets like --maps_cutoff_fold_change or --maps_cutoff_counts is quality control parameters that may not related with genome size.

By the way, I find there is a parameter in MAPS named binning_range. But I do not find it in HiCAR pipeline. Is it may be related with the genome size ?

binning_range
binning range. How far 3D interactions can be called, also affects the estimate of the expected count. Default=1000000 Do not set to high value if data is sparse. Check MAPS paper for more details

And Thanks again for your detailed reply.

jianhong commented 2 years ago

You got most of them. You may want to refer: https://nf-co.re/hicar/1.0.0/parameters#maps-peak-calling-options

For binning_range, you may want to follow the section https://nf-co.re/hicar/1.0.0/usage#custom-configuration

Here is the default setting for that: https://github.com/jianhong/hicar/blob/a577c424f12136f703298a4c9327043b45e80fa4/conf/modules.config#L560-L566

Let me know if you have trouble in understanding this. Indeed, it is not easy to understand. Please do not be hesitate to ask me if you have any questions.

Jianhong.

From: Shawn_Shang @.> Date: Friday, July 8, 2022 at 10:17 AM To: jianhong/nf-core-hicar @.> Cc: JIANHONG OU @.>, Comment @.> Subject: Re: [jianhong/nf-core-hicar] there is no MAPSpeak result for test file (Issue #7)

Thanks, Dr Ou. I am wondering whether you can give me some advice about the fine-tuing paramter. I notcied that the major interaction call tools is MAPS. And I find there is only one parameter --cool_bin resolution bin size may be related with the genome size and expected the total number of interaction. And other paramets like --maps_cutoff_fold_change or --maps_cutoff_counts is quality control parameters that may not related with genome size.

By the way, I find there is a parameter in MAPS named binning_range. But I do not find it in HiCAR pipeline. Is it may be related with the genome size ?

binning_range

binning range. How far 3D interactions can be called, also affects the estimate of the expected count. Default=1000000 Do not set to high value if data is sparse. Check MAPS paper for more details

And Thanks again for your detailed reply.

— Reply to this email directly, view it on GitHubhttps://github.com/jianhong/nf-core-hicar/issues/7#issuecomment-1179041675, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABLBEA5YVVUMCAHNKL7DWPTVTA2BFANCNFSM52P7FWBQ. You are receiving this because you commented.Message ID: @.***>

jianhong / nf-core-hicar