kerenzhou062 / PEALS

Peak-based Enhancement Analysis pipeLine for MeRIP-Seq
MIT License
0 stars 0 forks source link

TypeError in peals callpeak #4

Closed fork16 closed 1 year ago

fork16 commented 1 year ago

周老师您好,我在使用peals callpeak时出现几个问题 第一个错误如下:

(python38) root@1ef77f50eaef:/file# peals callpeak -i . -m PEALS/example/peals.callpeak.sample.txt -P peals -o peals \
>     --gsize /file/hg38.chrom.sizes \
>     --gff /file/Homo_sapiens.GRCh38.91.gtf \
>     --gff-type GTF
INFO  @ Wed, 23 Aug 2023 14:21:42: PEALS (v1.2.4) is running with following parameters: 
INFO  @ Wed, 23 Aug 2023 14:21:42: [input] parameter group: call-with-binary=False, binary-dir=None, input=., matrix=PEALS/example/peals.callpeak.sample.txt, recursive=False, type=mature, sample=None 
INFO  @ Wed, 23 Aug 2023 14:21:42: [output] parameter group: keep-temp=False, no-bwtrack=False, output=peals, prefix=peals, temp=None, verbose=2 
INFO  @ Wed, 23 Aug 2023 14:21:42: [library] parameter group: estlib-size=primary_mapped, extsize=0, library=2, pairend=False 
INFO  @ Wed, 23 Aug 2023 14:21:42: [normalize] parameter group: bw-scale=rpm, exp-method=count, estpeak-sizefactor=peak 
INFO  @ Wed, 23 Aug 2023 14:21:42: [bam] parameter group: exp-cutoff=1, frac-overlap=0, ignore-dup=False, no-fraction=False, scale-sample=to_small, sortbam=False 
INFO  @ Wed, 23 Aug 2023 14:21:42: [constant] parameter group: buffer=10000000, reltol=0.05, thread=10, thread-start=fork 
INFO  @ Wed, 23 Aug 2023 14:21:42: [annotation] parameter group: gff=/file/Homo_sapiens.GRCh38.91.gtf, gff-source=GENCODE, gff-type=GTF, gsize=/file/hg38.chrom.sizes, identifier=gene_id:gene_name:gene_type:transcript_id:transcript_name:transcript_type 
INFO  @ Wed, 23 Aug 2023 14:21:42: [peak] parameter group: center=0.25, complexity-rate=0.05, csaps-p=0.005, ipratio=1.25, lookahead=25, peak-size=35, span-loop=1, span-method=move, split=50000 
INFO  @ Wed, 23 Aug 2023 14:21:42: [model] parameter group: fit-type=parametric, formula=None, shrink=apeglm, test=Wald 
INFO  @ Wed, 23 Aug 2023 14:21:42: [filter] parameter group: fold=2, padj=0.05, padj-method=BH, pval=0.05 
INFO  @ Wed, 23 Aug 2023 14:21:42: Checking whether required third party software are properly installed... 
INFO  @ Wed, 23 Aug 2023 14:21:42: Congratulations! All required third party software are properly installed! 
Traceback (most recent call last):
  File "/root/.local/bin/peals", line 352, in <module>
    main()
  File "/root/.local/bin/peals", line 35, in main
    run( args )
  File "/root/.local/lib/python3.8/site-packages/PEALS/subcmd/callpeak_cmd.py", line 51, in run
    tempPrefix = os.path.join(options.tempdir, "_".join([options.tempre, options.prefix]))
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
INFO  @ Wed, 23 Aug 2023 14:21:42: Embedded R ended. 
INFO  @ Wed, 23 Aug 2023 14:21:42: Embedded R already ended. 

经过检查options后,确实发现options.tempdir=None ,使用-T参数即可解决这个问题 值得一提,在help内,有以下说明

-o OUTPUTDIR, --output OUTPUTDIR
                        All output files will be written to this directory. (default: None)
  -T TEMPDIR, --temp TEMPDIR
                        If specified, all temporary files will be written to this directory. Default is the same with --output.
                        (default: None)

说明上写的是默认和-o参数一致,但实际上没有一致,可能是老师您漏掉了这个地方。

fork16 commented 1 year ago

第二个错误是pandas版本问题,安装peals时版本为2.0以上,而在分析时在 readCountDf.set_axis(columNameList, axis=1, inplace=True) 这里出现问题,经过检查,在pandas 1.5版本之后,inplace参数就被pandas模块删掉了,所以pandas最新只能安装到1.5.2版本。

fork16 commented 1 year ago

第三个问题是这个:

(python38) root@1ef77f50eaef:/file# peals callpeak -i /file/bam/ -m /file/peals.callpeak.sample.txt -P rep -o rep -T rep \
>     --gsize /file/hg38.chrom.sizes \
>     --gff /file/Homo_sapiens.GRCh38.91.gtf \
>     --gff-type GTF
INFO  @ Wed, 23 Aug 2023 16:23:54: PEALS (v1.2.4) is running with following parameters: 
INFO  @ Wed, 23 Aug 2023 16:23:54: [input] parameter group: call-with-binary=False, binary-dir=None, input=/file/bam/, matrix=/file/peals.callpeak.sample.txt, recursive=False, type=mature, sample=None 
INFO  @ Wed, 23 Aug 2023 16:23:54: [output] parameter group: keep-temp=False, no-bwtrack=False, output=rep, prefix=rep, temp=rep, verbose=2 
INFO  @ Wed, 23 Aug 2023 16:23:54: [library] parameter group: estlib-size=primary_mapped, extsize=0, library=2, pairend=False 
INFO  @ Wed, 23 Aug 2023 16:23:54: [normalize] parameter group: bw-scale=rpm, exp-method=count, estpeak-sizefactor=peak 
INFO  @ Wed, 23 Aug 2023 16:23:54: [bam] parameter group: exp-cutoff=1, frac-overlap=0, ignore-dup=False, no-fraction=False, scale-sample=to_small, sortbam=False 
INFO  @ Wed, 23 Aug 2023 16:23:54: [constant] parameter group: buffer=10000000, reltol=0.05, thread=10, thread-start=fork 
INFO  @ Wed, 23 Aug 2023 16:23:54: [annotation] parameter group: gff=/file/Homo_sapiens.GRCh38.91.gtf, gff-source=GENCODE, gff-type=GTF, gsize=/file/hg38.chrom.sizes, identifier=gene_id:gene_name:gene_type:transcript_id:transcript_name:transcript_type 
INFO  @ Wed, 23 Aug 2023 16:23:54: [peak] parameter group: center=0.25, complexity-rate=0.05, csaps-p=0.005, ipratio=1.25, lookahead=25, peak-size=35, span-loop=1, span-method=move, split=50000 
INFO  @ Wed, 23 Aug 2023 16:23:54: [model] parameter group: fit-type=parametric, formula=None, shrink=apeglm, test=Wald 
INFO  @ Wed, 23 Aug 2023 16:23:54: [filter] parameter group: fold=2, padj=0.05, padj-method=BH, pval=0.05 
INFO  @ Wed, 23 Aug 2023 16:23:54: Checking whether required third party software are properly installed... 
INFO  @ Wed, 23 Aug 2023 16:23:54: Congratulations! All required third party software are properly installed! 
INFO  @ Wed, 23 Aug 2023 16:23:54: Reading information from chromosome size file. 
INFO  @ Wed, 23 Aug 2023 16:23:54: Reading information from input sample matrix file (peals.callpeak.sample.txt). 
INFO  @ Wed, 23 Aug 2023 16:24:43: Creating soft symbolic links to input bam and bai files. 
INFO  @ Wed, 23 Aug 2023 16:24:43: Preparing genome-wide reads coverage... 
INFO  @ Wed, 23 Aug 2023 16:30:12: Obtaining trascript expression... 
INFO  @ Wed, 23 Aug 2023 16:32:07: Use raw reads count to estimate the transcript expression for filtering. 
INFO  @ Wed, 23 Aug 2023 16:32:07: Filter expression with cutoff (1). 
INFO  @ Wed, 23 Aug 2023 16:32:07: After filtering, 18130 transcripts were left for peak calling. 
INFO  @ Wed, 23 Aug 2023 16:32:07: Normalize expression by using TPM. 
INFO  @ Wed, 23 Aug 2023 16:32:07: Decoding and building annotation information from /file/Homo_sapiens.GRCh38.91.gtf... 
INFO  @ Wed, 23 Aug 2023 16:34:35: 18130 transcripts were used from the gene annotation file 
INFO  @ Wed, 23 Aug 2023 16:34:35: Starting to call peaks on IP and input libraries... 
INFO  @ Wed, 23 Aug 2023 16:34:35: Calling peak candidates on ip (rep1-IP.bam) and input (rep1-Input.bam)... 
INFO  @ Peak-calling status:   0% [elapsed: 00:00|estiamted remaining:?]
INFO  @ Peak-calling status:   0% [elapsed: 00:30|estiamted remaining:?]

Traceback (most recent call last):
  File "/root/.local/bin/peals", line 352, in <module>
    main()
  File "/root/.local/bin/peals", line 35, in main
    run( args )
  File "/root/.local/lib/python3.8/site-packages/PEALS/subcmd/callpeak_cmd.py", line 135, in run
    perPeakRowList = peakcall.callPeak(options, txBedDict, subTxExpDf, bamList, label)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakcall.py", line 269, in callPeak
    txPeakCallResultList = runCallPeakPerTxParallel(options, bamList, chromTxDict, txBedDict)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakcall.py", line 238, in runCallPeakPerTxParallel
    for result in tqdm.tqdm(imapUnordered, total=taskCount, mininterval=60, maxinterval=120, miniters=miniters, position=0, leave=True, ascii=True, bar_format=barFormat):
  File "/root/.local/lib/python3.8/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
KeyError: '6'
INFO  @ Wed, 23 Aug 2023 16:35:10: Embedded R ended. 
INFO  @ Wed, 23 Aug 2023 16:35:10: Embedded R already ended. 

这个我不清楚是什么错误,还请老师解答一下

kerenzhou062 commented 1 year ago

For Question1-2, I will double check the codes and then get it back to you. For Question-3, could you please confirm that the chromosomes name in used BAM, GTF and hg38.chrom.sizes are consistent with each other (e.g., they are all in this format, chr1, chr2, chr3 ...)?

kerenzhou062 commented 1 year ago

For Question1-2, it should be fixed.

Best,

Keren

fork16 commented 1 year ago

Bam: image hg38.chrom.sizes: image gtf: image

They all don't have the chr character

kerenzhou062 commented 1 year ago

I have no idea what causes the error. Could you please run PEALS on the data again by enabling "--verbose 3", which will report more detail logs for debug? Thanks!

fork16 commented 1 year ago
(python38) root@1ef77f50eaef:/file# peals callpeak -i /file/bam/ -m /file/peals.callpeak.sample.txt -P rep -o rep -T rep \
>     --gsize /file/hg38.chrom.sizes \
>     --gff /file/Homo_sapiens.GRCh38.91.gtf \
>     --gff-type GTF \
>     --gff-source ENSEMBL \
>     --verbose 3
INFO  @ Thu, 24 Aug 2023 09:30:55: PEALS (v1.2.4) is running with following parameters: 
INFO  @ Thu, 24 Aug 2023 09:30:55: [input] parameter group: call-with-binary=False, binary-dir=None, input=/file/bam/, matrix=/file/peals.callpeak.sample.txt, recursive=False, type=mature, sample=None 
INFO  @ Thu, 24 Aug 2023 09:30:55: [output] parameter group: keep-temp=False, no-bwtrack=False, output=rep, prefix=rep, temp=rep, verbose=3 
INFO  @ Thu, 24 Aug 2023 09:30:55: [library] parameter group: estlib-size=primary_mapped, extsize=0, library=2, pairend=False 
INFO  @ Thu, 24 Aug 2023 09:30:55: [normalize] parameter group: bw-scale=rpm, exp-method=count, estpeak-sizefactor=peak 
INFO  @ Thu, 24 Aug 2023 09:30:55: [bam] parameter group: exp-cutoff=1, frac-overlap=0, ignore-dup=False, no-fraction=False, scale-sample=to_small, sortbam=False 
INFO  @ Thu, 24 Aug 2023 09:30:55: [constant] parameter group: buffer=10000000, reltol=0.05, thread=10, thread-start=fork 
INFO  @ Thu, 24 Aug 2023 09:30:55: [annotation] parameter group: gff=/file/Homo_sapiens.GRCh38.91.gtf, gff-source=ENSEMBL, gff-type=GTF, gsize=/file/hg38.chrom.sizes, identifier=gene_id:gene_name:gene_type:transcript_id:transcript_name:transcript_type 
INFO  @ Thu, 24 Aug 2023 09:30:55: [peak] parameter group: center=0.25, complexity-rate=0.05, csaps-p=0.005, ipratio=1.25, lookahead=25, peak-size=35, span-loop=1, span-method=move, split=50000 
INFO  @ Thu, 24 Aug 2023 09:30:55: [model] parameter group: fit-type=parametric, formula=None, shrink=apeglm, test=Wald 
INFO  @ Thu, 24 Aug 2023 09:30:55: [filter] parameter group: fold=2, padj=0.05, padj-method=BH, pval=0.05 
INFO  @ Thu, 24 Aug 2023 09:30:55: Checking whether required third party software are properly installed... 
INFO  @ Thu, 24 Aug 2023 09:30:56: Congratulations! All required third party software are properly installed! 
INFO  @ Thu, 24 Aug 2023 09:30:56: Reading information from chromosome size file. 
INFO  @ Thu, 24 Aug 2023 09:30:56: Reading information from input sample matrix file (peals.callpeak.sample.txt). 
DEBUG @ Thu, 24 Aug 2023 09:31:04: The final information of input sample matrix file is as follow:| id         | library   | condition   |   replicate | label   | bam                                  | group   | bai                                      |   lib_size | symlink_bam        | symlink_bai            |   bam_unique |   whole_lib_scale |   paired_lib_scale |
|:-----------|:----------|:------------|------------:|:--------|:-------------------------------------|:--------|:-----------------------------------------|-----------:|:-------------------|:-----------------------|-------------:|------------------:|-------------------:|
| rep1-Input | input     | control     |           1 | rep1    | /file/bam/rep1-Input.uniq.sorted.bam | rep     | /file/bam/rep1-Input.uniq.sorted.bam.bai |   13699844 | rep/rep1-Input.bam | rep/rep1-Input.bam.bai |            1 |           1       |            1       |
| rep2-Input | input     | control     |           2 | rep2    | /file/bam/rep2-Input.uniq.sorted.bam | rep     | /file/bam/rep2-Input.uniq.sorted.bam.bai |   15046946 | rep/rep2-Input.bam | rep/rep2-Input.bam.bai |            1 |           1.09833 |            1       |
| rep3-Input | input     | control     |           3 | rep3    | /file/bam/rep3-Input.uniq.sorted.bam | rep     | /file/bam/rep3-Input.uniq.sorted.bam.bai |   18458386 | rep/rep3-Input.bam | rep/rep3-Input.bam.bai |            1 |           1.34734 |            1       |
| rep1-IP    | ip        | control     |           1 | rep1    | /file/bam/rep1-IP.uniq.sorted.bam    | rep     | /file/bam/rep1-IP.uniq.sorted.bam.bai    |   18149988 | rep/rep1-IP.bam    | rep/rep1-IP.bam.bai    |            1 |           1.32483 |            1.32483 |
| rep2-IP    | ip        | control     |           2 | rep2    | /file/bam/rep2-IP.uniq.sorted.bam    | rep     | /file/bam/rep2-IP.uniq.sorted.bam.bai    |   17615842 | rep/rep2-IP.bam    | rep/rep2-IP.bam.bai    |            1 |           1.28584 |            1.17073 |
| rep3-IP    | ip        | control     |           3 | rep3    | /file/bam/rep3-IP.uniq.sorted.bam    | rep     | /file/bam/rep3-IP.uniq.sorted.bam.bai    |   19284828 | rep/rep3-IP.bam    | rep/rep3-IP.bam.bai    |            1 |           1.40767 |            1.04477 | 
INFO  @ Thu, 24 Aug 2023 09:31:04: Creating soft symbolic links to input bam and bai files. 
INFO  @ Thu, 24 Aug 2023 09:31:04: Preparing genome-wide reads coverage... 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep1-IP.bam) with command: bedtools genomecov -ibam rep/rep1-IP.bam  -split -strand - -bg > rep/peals_tmp_rep1-IP.forward.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep3-IP.bam) with command: bedtools genomecov -ibam rep/rep3-IP.bam  -split -strand - -bg > rep/peals_tmp_rep3-IP.forward.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep1-IP.bam) with command: bedtools genomecov -ibam rep/rep1-IP.bam  -split -strand + -bg > rep/peals_tmp_rep1-IP.reverse.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep2-IP.bam) with command: bedtools genomecov -ibam rep/rep2-IP.bam  -split -strand - -bg > rep/peals_tmp_rep2-IP.forward.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep1-Input.bam) with command: bedtools genomecov -ibam rep/rep1-Input.bam  -split -strand - -bg > rep/peals_tmp_rep1-Input.forward.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep2-IP.bam) with command: bedtools genomecov -ibam rep/rep2-IP.bam  -split -strand + -bg > rep/peals_tmp_rep2-IP.reverse.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep1-Input.bam) with command: bedtools genomecov -ibam rep/rep1-Input.bam  -split -strand + -bg > rep/peals_tmp_rep1-Input.reverse.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep3-IP.bam) with command: bedtools genomecov -ibam rep/rep3-IP.bam  -split -strand + -bg > rep/peals_tmp_rep3-IP.reverse.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep2-Input.bam) with command: bedtools genomecov -ibam rep/rep2-Input.bam  -split -strand - -bg > rep/peals_tmp_rep2-Input.forward.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:31:04: Generating genome-wide coverage for bam (rep2-Input.bam) with command: bedtools genomecov -ibam rep/rep2-Input.bam  -split -strand + -bg > rep/peals_tmp_rep2-Input.reverse.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:32:05: Generating genome-wide coverage for bam ("+" strand, rep1-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:05: Generating genome-wide coverage for bam (rep3-Input.bam) with command: bedtools genomecov -ibam rep/rep3-Input.bam  -split -strand - -bg > rep/peals_tmp_rep3-Input.forward.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:32:11: Generating genome-wide coverage for bam ("+" strand, rep1-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:11: Generating genome-wide coverage for bam (rep3-Input.bam) with command: bedtools genomecov -ibam rep/rep3-Input.bam  -split -strand + -bg > rep/peals_tmp_rep3-Input.reverse.bedGraph 
DEBUG @ Thu, 24 Aug 2023 09:32:20: Generating genome-wide coverage for bam ("+" strand, rep3-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:26: Generating genome-wide coverage for bam ("-" strand, rep1-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:29: Generating genome-wide coverage for bam ("-" strand, rep2-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:31: Generating genome-wide coverage for bam ("+" strand, rep2-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:32: Generating genome-wide coverage for bam ("+" strand, rep2-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:37: Generating genome-wide coverage for bam ("-" strand, rep1-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:38: Generating genome-wide coverage for bam ("-" strand, rep2-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:32:42: Generating genome-wide coverage for bam ("-" strand, rep3-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:33:23: Generating genome-wide coverage for bam ("+" strand, rep3-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:33:25: Generating genome-wide coverage for bam ("-" strand, rep3-Input.bam) done. 
INFO  @ Thu, 24 Aug 2023 09:33:25: Obtaining trascript expression... 
DEBUG @ Thu, 24 Aug 2023 09:33:25: Running featureCounts with command (featureCounts -a /file/Homo_sapiens.GRCh38.91.gtf -F GTF -g transcript_id -o /file/rep/peals_tmpisw63960.count.tmp -s 2 -T 10 --fracOverlap 0  -M -O --fraction --donotsort  rep/rep1-IP.bam rep/rep2-IP.bam rep/rep3-IP.bam rep/rep1-Input.bam rep/rep2-Input.bam rep/rep3-Input.bam)... 
DEBUG @ Thu, 24 Aug 2023 09:33:53: Running featureCounts done. 
INFO  @ Thu, 24 Aug 2023 09:33:54: Use raw reads count to estimate the transcript expression for filtering. 
INFO  @ Thu, 24 Aug 2023 09:33:54: Filter expression with cutoff (1). 
INFO  @ Thu, 24 Aug 2023 09:33:54: After filtering, 18130 transcripts were left for peak calling. 
INFO  @ Thu, 24 Aug 2023 09:33:54: Normalize expression by using TPM. 
INFO  @ Thu, 24 Aug 2023 09:33:54: Decoding and building annotation information from /file/Homo_sapiens.GRCh38.91.gtf... 
DEBUG @ Thu, 24 Aug 2023 09:34:52: Prepearing chromosomes for decoding annotations. 
DEBUG @ Thu, 24 Aug 2023 09:35:41: Transcripts from following chromosomes will be decoded:(1,10,11,12,13,14,15,16,17,18,19,2,3,4,5,6,7,8,9,MT,X,Y) 
DEBUG @ Thu, 24 Aug 2023 09:35:41: Decoding annotations in parrallel... 
INFO  @ Thu, 24 Aug 2023 09:36:09: 18130 transcripts were used from the gene annotation file 
INFO  @ Thu, 24 Aug 2023 09:36:09: Starting to call peaks on IP and input libraries... 
INFO  @ Thu, 24 Aug 2023 09:36:09: Calling peak candidates on ip (rep1-IP.bam) and input (rep1-Input.bam)... 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Obtain library size: ip(18149988), input(13699844). 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Obtain library scale factor: ip(1.3248317280109174), input(1.0). 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading reads coverage in parallel... 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("-" strand, rep1-Input.bam)... 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("+" strand, rep1-Input.bam)... 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("-" strand, rep1-IP.bam)... 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("+" strand, rep1-IP.bam)... 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("-" strand, rep1-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("+" strand, rep1-IP.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:36:09: Reading genome-wide coverage for bam ("+" strand, rep1-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:36:10: Reading genome-wide coverage for bam ("-" strand, rep1-Input.bam) done. 
DEBUG @ Thu, 24 Aug 2023 09:36:12: Preparing data for parallel calling of peak candidates ... 
DEBUG @ Thu, 24 Aug 2023 09:36:12: Calling peak candidates in parallel with the estimation of 3550 tasks... 
INFO  @ Peak-calling status:   0% [elapsed: 00:00|estiamted remaining:?]
INFO  @ Peak-calling status:   0% [elapsed: 00:34|estiamted remaining:?]

Traceback (most recent call last):
  File "/root/.local/bin/peals", line 352, in <module>
    main()
  File "/root/.local/bin/peals", line 35, in main
    run( args )
  File "/root/.local/lib/python3.8/site-packages/PEALS/subcmd/callpeak_cmd.py", line 135, in run
    perPeakRowList = peakcall.callPeak(options, txBedDict, subTxExpDf, bamList, label)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakcall.py", line 269, in callPeak
    txPeakCallResultList = runCallPeakPerTxParallel(options, bamList, chromTxDict, txBedDict)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakcall.py", line 238, in runCallPeakPerTxParallel
    for result in tqdm.tqdm(imapUnordered, total=taskCount, mininterval=60, maxinterval=120, miniters=miniters, position=0, leave=True, ascii=True, bar_format=barFormat):
  File "/root/.local/lib/python3.8/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
KeyError: '6'
DEBUG @ Thu, 24 Aug 2023 09:36:47: Ending embedded R process. 
DEBUG @ Thu, 24 Aug 2023 09:36:47: R_do_Last() 
DEBUG @ Thu, 24 Aug 2023 09:36:47: R_RunExitFinalizers() 
DEBUG @ Thu, 24 Aug 2023 09:36:47: Rf_KillAllDevices() 
DEBUG @ Thu, 24 Aug 2023 09:36:47: R_CleanTempDir() 
DEBUG @ Thu, 24 Aug 2023 09:36:47: R_gc 
DEBUG @ Thu, 24 Aug 2023 09:36:47: Rf_endEmbeddedR(fatal) 
INFO  @ Thu, 24 Aug 2023 09:36:47: Embedded R ended. 
DEBUG @ Thu, 24 Aug 2023 09:36:47: Ending embedded R process. 
INFO  @ Thu, 24 Aug 2023 09:36:47: Embedded R already ended. 
fork16 commented 1 year ago

I wonder the versions of Python and modules in your environment. I guess the problem lies in the conflict between Python and modules. When I try to install the minimum version of the module required by the software, there are many errors, indicating that the version is too old and cannot be used.

kerenzhou062 commented 1 year ago

Hi I think I just fix this bug. I use dataframe to store the bedgraph data, so if the chromsomes are in number, then pandas will automatically read the chromosome as integer, where the chromosome information read from hg38.chrom.sizes is stored as string. That's why you will have errors like KeyError. Please try PEALS again on your data with the latest version. If the issue is solved, please let me know.

fork16 commented 1 year ago

Yes! It works! Calling Peak is finish. But there is another bug

INFO  @ Fri, 25 Aug 2023 10:30:10: Calling peak candidates on ip (rep1-IP.bam,rep2-IP.bam,rep3-IP.bam) and input (rep1-Input.bam,rep2-Input.bam,rep3-Input.bam) done. 
INFO  @ Fri, 25 Aug 2023 10:30:10: Finding consensus peak candidates from replicates... 
INFO  @ Fri, 25 Aug 2023 10:30:12: Performing statistical testing on peak candidates... 
DEBUG @ Fri, 25 Aug 2023 10:30:12: Obtaining reads for peak candidates... 
DEBUG @ Fri, 25 Aug 2023 10:30:12: Running featureCounts with command (featureCounts -a /file/rep/peals_tmpycsj1d7h.saf.tmp -F SAF  -o /file/rep/peals_tmpc9mj7fad.count.tmp -s 2 -T 10 --fracOverlap 0  -M -O --fraction --donotsort  rep/rep1-IP.bam rep/rep2-IP.bam rep/rep3-IP.bam rep/rep1-Input.bam rep/rep2-Input.bam rep/rep3-Input.bam)... 
DEBUG @ Fri, 25 Aug 2023 10:30:27: Running featureCounts done. 
DEBUG @ Fri, 25 Aug 2023 10:30:27: Testing the significance of peak candidates... 
DEBUG @ Fri, 25 Aug 2023 10:30:28: Rscript /root/.local/lib/python3.8/site-packages/PEALS/stats/runGlmNbinomTest.R         --sample "rep/peals_tmp_rep.sample.matrix.tmp"         --counts "rep/peals_tmp_rep.counts.matrix.tmp"         --mean 0         --formula "library"         --relevel "library:input"         --name "library_ip_vs_input"         --shrink "apeglm"         --sep "|="         --prefix "rep"         --reduce "1"         --padjust "BH"         --test "Wald"         --fittype "parametric"         --temp "rep/peals_tmp_rep"         --output "rep"         --plot   
Traceback (most recent call last):
  File "/root/.local/bin/peals", line 352, in <module>
    main()
  File "/root/.local/bin/peals", line 35, in main
    run( args )
  File "/root/.local/lib/python3.8/site-packages/PEALS/subcmd/callpeak_cmd.py", line 176, in run
    testResDf = peaktest.runGlmNbinomTest(options, peakReadCountDf, plot=True, skipSubplot=False)
  File "/root/.local/lib/python3.8/site-packages/PEALS/stats/peaktest.py", line 297, in runGlmNbinomTest
    __ = functools.subprocessToList(command, errorFlag)
  File "/root/.local/lib/python3.8/site-packages/PEALS/collection/functools.py", line 99, in subprocessToList
    runResList = bytes.decode(subprocess.check_output(command, shell=True, stderr=None)).split('\n')
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'Rscript /root/.local/lib/python3.8/site-packages/PEALS/stats/runGlmNbinomTest.R         --sample "rep/peals_tmp_rep.sample.matrix.tmp"         --counts "rep/peals_tmp_rep.counts.matrix.tmp"         --mean 0         --formula "library"         --relevel "library:input"         --name "library_ip_vs_input"         --shrink "apeglm"         --sep "|="         --prefix "rep"         --reduce "1"         --padjust "BH"         --test "Wald"         --fittype "parametric"         --temp "rep/peals_tmp_rep"         --output "rep"         --plot  ' returned non-zero exit status 2.
DEBUG @ Fri, 25 Aug 2023 10:30:28: Ending embedded R process. 
DEBUG @ Fri, 25 Aug 2023 10:30:28: R_do_Last() 
DEBUG @ Fri, 25 Aug 2023 10:30:28: R_RunExitFinalizers() 
DEBUG @ Fri, 25 Aug 2023 10:30:28: Rf_KillAllDevices() 
DEBUG @ Fri, 25 Aug 2023 10:30:28: R_CleanTempDir() 
DEBUG @ Fri, 25 Aug 2023 10:30:28: R_gc 
DEBUG @ Fri, 25 Aug 2023 10:30:28: Rf_endEmbeddedR(fatal) 
INFO  @ Fri, 25 Aug 2023 10:30:28: Embedded R ended. 
DEBUG @ Fri, 25 Aug 2023 10:30:28: Ending embedded R process. 
INFO  @ Fri, 25 Aug 2023 10:30:28: Embedded R already ended. 

It looks like a bug appeared in R, and I can't see the bug message at all. Here's the filelist in my temp workdir image

kerenzhou062 commented 1 year ago

Hi, please run this command "Rscript /root/.local/lib/python3.8/site-packages/PEALS/stats/runGlmNbinomTest.R --sample "rep/peals_tmp_rep.sample.matrix.tmp" --counts "rep/peals_tmp_rep.counts.matrix.tmp" --mean 0 --formula "library" --relevel "library:input" --name "library_ip_vs_input" --shrink "apeglm" --sep "|=" --prefix "rep" --reduce "1" --padjust "BH" --test "Wald" --fittype "parametric" --temp "rep/peals_tmp_rep" --output "rep" --plot" and check the errors.

fork16 commented 1 year ago

There is no runGlmNbinomTest.R in /root/.local/lib/python3.8/site-packages/PEALS/stats/, which is causing this error. I copy it from PEALS/PEALS/stats/runGlmNbinomTest.R . I installed PEALS using the following command pip3 install peals --user -i https://mirrors.aliyun.com/pypi/simple/ and I'm not sure whether this method caused missing files.

Next, the R script started running, but an error occurred. After inspection, it was found that there was a problem reading the table. In lines 138 and 140 of runGlmNbinomTest.R, for these two read.csv function, it is necessary to add the parameter check. names=F because my file name is rep-1, and if I do not add this parameter, R will read 'rep-1' as' rep.1 '. It seems that only cts will do this, but colData will not, resulting in the names of these two files being different. Causing an error occurred in thects<- cts [, rownames (colData)]command.

Finally, after correcting all errors, the software has completed running.But the result seems very bad. Just one peak can be called . image

Does this mean that my data is too poor to call peak? Is there any parameter that can relax the restrictions

fork16 commented 1 year ago

Hi, I used following params --padj-method none -q 1 -f 1 and only 17 peaks were obtained.

And when I analyze other grouped data using the param above, I got the following error


INFO  @ Peak-calling status: 100% [elapsed: 12:07|estiamted remaining:00:00]

DEBUG @ Tue, 29 Aug 2023 19:47:46: Removing redundant peak candidates... 
INFO  @ Tue, 29 Aug 2023 19:47:53: Calling peak candidates on ip (treat1-IP.bam,treat2-IP.bam,treat3-IP.bam) and input (treat1-Input.bam,treat2-Input.bam,treat3-Input.bam) done. 
INFO  @ Tue, 29 Aug 2023 19:47:53: Finding consensus peak candidates from replicates... 
Traceback (most recent call last):
  File "/root/.local/bin/peals", line 352, in <module>
    main()
  File "/root/.local/bin/peals", line 38, in main
    run( args )
  File "/root/.local/lib/python3.8/site-packages/PEALS/subcmd/diffpeak_cmd.py", line 116, in run
    treatPeakRowList = findPeakFromSample(options, txBedDict, tipBamList, tinputBamList, tlabelList, geneExpDf, subTxExpDf, 'treated')
  File "/root/.local/lib/python3.8/site-packages/PEALS/subcmd/diffpeak_cmd.py", line 268, in findPeakFromSample
    peakRowList = peakutils.poolPeak(options, peakRowList, sampleCount, condition, peakMode='intersect', txBedDict=None)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakutils.py", line 619, in poolPeak
    tpoolPeakRowList = poolPeakBySplitParallel(options, tpeakRowList, repCount, peakMode, txBedDict)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakutils.py", line 554, in poolPeakBySplitParallel
    networkEdgeGroupList = getNetworkEdgeFromPeak(options, peakRowList, peakRowDict)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakutils.py", line 337, in getNetworkEdgeFromPeak
    intersectResList = getIntersectPeakidPairFromPeak(options, peakRowList)
  File "/root/.local/lib/python3.8/site-packages/PEALS/peak/peakutils.py", line 309, in getIntersectPeakidPairFromPeak
    runResRowList = functools.runBedtools(options, command, peakRowList, peakRowList)
  File "/root/.local/lib/python3.8/site-packages/PEALS/collection/functools.py", line 179, in runBedtools
    runResRowList = subprocessToList(command)
  File "/root/.local/lib/python3.8/site-packages/PEALS/collection/functools.py", line 101, in subprocessToList
    runResList = bytes.decode(subprocess.check_output(command, shell=True, stderr=subprocess.PIPE)).split('\n')
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/root/mambaforge-pypy3/envs/python38/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'bedtools intersect -a /file/diff/peals_tmpc8hcxs5n.bed.tmp -b /file/diff/peals_tmpwwbtqrtt.bed.tmp -g /file/diff/peals_tmphkd29qdv.gsize.tmp -sorted -s -split -wa -wb -nonamecheck ' returned non-zero exit status 1.
DEBUG @ Tue, 29 Aug 2023 19:47:53: Ending embedded R process. 
DEBUG @ Tue, 29 Aug 2023 19:47:53: R_do_Last() 
DEBUG @ Tue, 29 Aug 2023 19:47:53: R_RunExitFinalizers() 
DEBUG @ Tue, 29 Aug 2023 19:47:53: Rf_KillAllDevices() 
DEBUG @ Tue, 29 Aug 2023 19:47:53: R_CleanTempDir() 
DEBUG @ Tue, 29 Aug 2023 19:47:53: R_gc 
DEBUG @ Tue, 29 Aug 2023 19:47:53: Rf_endEmbeddedR(fatal) 
INFO  @ Tue, 29 Aug 2023 19:47:53: Embedded R ended. 
DEBUG @ Tue, 29 Aug 2023 19:47:53: Ending embedded R process. 
INFO  @ Tue, 29 Aug 2023 19:47:53: Embedded R already ended. 

I think it can't find any consensus peak which cause this error.

Does this mean that my data is very poor?

fork16 commented 1 year ago

周老师,我想问个问题,您觉得多重比对的reads(一条read比对到多个位置)对RNA测序的下游分析是否有很大的影响?是否必须要删除掉这些多重比对的reads?

kerenzhou062 commented 1 year ago

I will prefer including the multipe-mapping reads into analysis.