maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
MIT License
374 stars 85 forks source link

WGBS: unrecognized reference name ""; treated as unmapped #936

Closed hanrong498 closed 5 months ago

hanrong498 commented 10 months ago

Hi!

Sorry for the spamming :)

I am trying to run the WGBS pipeline but got the error inbwameth/logs/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.map_reads.err:

Traceback (most recent call last):
  File "/work/upzenk/miniconda3/envs/20d3d6a39823b295d3503a5332b4207b/bin/bwameth.py", line 559, in <module>
    main(sys.argv[1:])
  File "/work/upzenk/miniconda3/envs/20d3d6a39823b295d3503a5332b4207b/bin/bwameth.py", line 550, in main
    bwa_mem(args.reference, conv_fqs_cmd, ' '.join(map(str, pass_through_args)),
  File "/work/upzenk/miniconda3/envs/20d3d6a39823b295d3503a5332b4207b/bin/bwameth.py", line 349, in bwa_mem
    raise BWAMethException("first run bwameth.py index %s OR bwameth.py index-mem2 %s" % (fa, fa))
BWAMethException: first run bwameth.py index  OR bwameth.py index-mem2 
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"

The shell command from WGBS_run is

shell:

            TMPDIR=/tmp/
            MYTEMP=$(mktemp -d "${TMPDIR:-/tmp}"/snakepipes.XXXXXXXXXX)
            bwameth.py --threads 12 --reference "" "FASTQ/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001_R1_001_MM_1.fastq.gz" "FASTQ/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001_R2_001_MM_1.fastq.gz" 2> bwameth/logs/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.map_reads.err |           samtools sort -T "$MYTEMP"/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001 -m 3G -@ 4 -o "bwameth/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.bam" 2>> bwameth/logs/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.map_reads.err
            rm -rf "$MYTEMP"

when I activate this env I have the samtools:

(20d3d6a39823b295d3503a5332b4207b) [hhu@jed ~]$ samtools --help

Program: samtools (Tools for alignments in the SAM format)
Version: 1.17 (using htslib 1.17)

Can you help me with that? Thanks!

hanrong498 commented 10 months ago

I think the problem is that the reference.fa is missing. I am trying to manually re-write the organism.yaml file

$ cat /work/upzenk/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/organisms/GRCh38_v39_lambda_spikein.yaml 
blacklist_bed: ''
bowtie2_index: /scratch/hhu/index_snakepipes/human_GRCh38_v39_lambda_spikein/BowtieIndex/genome
bwa_index: /scratch/hhu/index_snakepipes/human_GRCh38_v39_lambda_spikein/BWAIndex/genome.fa
bwa_mem2_index: /scratch/hhu/index_snakepipes/human_GRCh38_v39_lambda_spikein/BWA-MEM2Index/genome.fa
bwameth2_index: /scratch/hhu/index_snakepipes/human_GRCh38_v39_lambda_spikein/BWAmeth2Index/genome.fa
bwameth_index: /scratch/hhu/index_snakepipes/human_GRCh38_v39_lambda_spikein/BWAmethIndex/genome.fa

All the indexes were generated using the createIndices pipeline. I have a naive question: among all the files in the *Index/ folder, do we always put the /genome.fa here? And if I understand it correctly, /genome.fa here is actually a symlink to genome_fasta/genome.fa?

hanrong498 commented 10 months ago

Hi,

Here I fixed the reference genome issue but still have problems running the WGBS pipeline. In the .err, the bwameth was run multiple times and the mapped files seem to not have a reference name:

[M::process] 0 single-end sequences; 1818182 paired-end sequences
[M::process] read 1818182 sequences (120000012 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (36, 178823, 11466, 10)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (117, 362, 1187)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 3327)
[M::mem_pestat] mean and std.dev: (635.83, 645.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 4397)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (83, 119, 169)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 341)
[M::mem_pestat] mean and std.dev: (126.55, 60.59)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 427)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (113, 314, 722)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1940)
[M::mem_pestat] mean and std.dev: (437.26, 432.76)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2549)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (119, 633, 788)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2126)
[M::mem_pestat] mean and std.dev: (557.10, 407.29)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2795)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 1818182 reads in 5503.742 CPU sec, 773.964 real sec
[M::process] 0 single-end sequences; 1818182 paired-end sequences
[M::process] read 1818182 sequences (120000012 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (33, 178223, 11445, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (109, 216, 777)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2113)
[M::mem_pestat] mean and std.dev: (373.16, 372.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2781)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (83, 119, 169)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 341)
[M::mem_pestat] mean and std.dev: (126.27, 60.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 427)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (109, 309, 722)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1948)
[M::mem_pestat] mean and std.dev: (437.98, 439.44)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2561)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
rse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped

This is not the complete log. The bwamem was executed > 10 times and there are more [W::sam_parse1] unrecognized reference name ""; treated as unmapped message below.

Can you help me with it? Thanks!

hanrong498 commented 10 months ago

This issue of [W::sam_parse1] unrecognized reference name ""; treated as unmapped from bwameth.py was mentioned here

As I have just checked, the reference I generated from bwa was also rchr1 1, which probably makes it unrecognisable.

$ head genome.fa.bwameth.c2t
>rchr1 1

Same issue was mentioned previously #898

Maybe there is a way to work around it?

katsikora commented 9 months ago

Hi @hanrong498,

thanks for looking into the details of the issue. Indeed, I've seen #898, and I wasn't able to reproduce it here. Unless something got messed up with bwa-meth or another tool on conda since we've built and tested the release. Did you build your conda environments for snakePipes recently? I can try if I can reproduce it with a freshly made conda env.

Best wishes,

Katarzyna

hanrong498 commented 9 months ago

Hi Katarzyna,

Thank you for the reply!

Yes I recently build the conda environments for snakePipes (~2 weeks ago).

I first ran the createIndices pipeline for a human_lambda hybrid genome. At first I thought this rchr1 1 was caused by running bwameth so recently. However, when I checked the pre-made indices provided in zenodo, the genome.fa.bwameth.c2t also started with rchr1 1. I have not yet tried running WGBS with the pre-made indices yet.

I also saw this issue running bwameth.py mentioned in bwa-meth, but I think it has been solved.

Lastly, I thought the unrecognised reference name is what made WGBS pipeline 'stucked'. However, when I subset my original data to 1000 reads, this process can actually proceed, even though I still got unrecognised reference name error in the log file.

PS: when running the subsetted data, I encountered an error in produceReport and I reported it yesterday #938 .

Thanks a lot! Hanrong

katsikora commented 9 months ago

Hi Hanrong,

if your pipeline was getting stuck with the full read set but is running with a subset of reads, you might need to assign more memory to the jobs. Are you running on a cluster or with --local ?

Best wishes,

Katarzyna

hanrong498 commented 9 months ago

Hi Katarzyna,

I am running it on one compute node on the cluster. Indeed in the first run, when I used the full read set, I did not assign enough memory. I will try again with this configuration:

#SBATCH --nodes 1
#SBATCH --ntasks 60
#SBATCH --mem 400G

In this node, there are ~7GB ram per core. Then in the pipeline, I will set --local -j 60. I hope this could finish the run :)

But as I am new to WGBS analysis, I wonder if this unrecognised reference name error, even though will not kill the process, can cause any downstream analysis problem?

Thanks & best regards! Hanrong

hanrong498 commented 9 months ago

Hi Katarzyna,

With this new mem allocation I can finish the bwa mem.

Additionally, I will put here a small issue I had in sambamba-markdup:

sambamba-markdup: Cannot open file `/tmp//snakepipes.QZfyXB38ey/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001/sambamba-pid1654661-markdup-jlul/PairedEndsInfoftfn19' in mode `w+' (Too many open files)

This can be solved by changing ulimit -n <large number> as suggested here.

However I encountered the same problem ! invalid 'times' argument as #898 . Specifically, I have the same PCA error:

Quitting from lines 197-212 [unnamed-chunk-10] (tmpxprjy2ka.WGBS_QC_report_template.Rmd)
Error in `rep()`:
! invalid 'times' argument
Backtrace:
 1. FactoMineR::PCA(t(methyl[IDX, ]), graph = FALSE)
 2. FactoMineR::svd.triplet(X, row.w = row.w, col.w = col.w, ncp = ncp)
 4. base::crossprod(rep(1, nrow(V)), as.matrix(V))

I am running on one sample, understandably I have the same issue. In the previous issue, you mentioned you add some code to skip this PCA plot. Is this implemented as an option in the snakePipes now or that I would need to change it myself?

And a related question: the next step I would indeed analyse multiple samples. My current folder structure is:

/path/to/sample1/sample1_R1.fastq.gz, sample1_R2.fastq.gz, 
/path/to/sample2/sample2_R1.fastq.gz, sample2_R2.fastq.gz

To dump all samples to the pipeline, I would need to put all fastq.gz files into one folder? If so, can symlink be enough?

Thank you very much for your help! Hanrong

katsikora commented 9 months ago

unrecognised reference name

Hi Hanrong,

what does the header of the bam file coming out of the mapping look like regarding the reference sequences ? samtools view -H $bam | grep @SQ

Best wishes,

Katarzyna

katsikora commented 9 months ago

Hi Katarzyna,

With this new mem allocation I can finish the bwa mem.

Additionally, I will put here a small issue I had in sambamba-markdup:

sambamba-markdup: Cannot open file `/tmp//snakepipes.QZfyXB38ey/211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001/sambamba-pid1654661-markdup-jlul/PairedEndsInfoftfn19' in mode `w+' (Too many open files)

This can be solved by changing ulimit -n <large number> as suggested here.

However I encountered the same problem ! invalid 'times' argument as #898 . Specifically, I have the same PCA error:

Quitting from lines 197-212 [unnamed-chunk-10] (tmpxprjy2ka.WGBS_QC_report_template.Rmd)
Error in `rep()`:
! invalid 'times' argument
Backtrace:
 1. FactoMineR::PCA(t(methyl[IDX, ]), graph = FALSE)
 2. FactoMineR::svd.triplet(X, row.w = row.w, col.w = col.w, ncp = ncp)
 4. base::crossprod(rep(1, nrow(V)), as.matrix(V))

I am running on one sample, understandably I have the same issue. In the previous issue, you mentioned you add some code to skip this PCA plot. Is this implemented as an option in the snakePipes now or that I would need to change it myself?

And a related question: the next step I would indeed analyse multiple samples. My current folder structure is:

/path/to/sample1/sample1_R1.fastq.gz, sample1_R2.fastq.gz, 
/path/to/sample2/sample2_R1.fastq.gz, sample2_R2.fastq.gz

To dump all samples to the pipeline, I would need to put all fastq.gz files into one folder? If so, can symlink be enough?

Thank you very much for your help! Hanrong

Hi Hanrong,

it looks like you're mapping large amounts of data, and sambamba chunked your large file into many small ones for marking duplicates. Then you hit the number of open files allowed by your system admin. You can alleviate this issue by increasing the memory assigned to rule markDupes .

The other issue you mention - invalid 'times' argument is because you're running only one sample, and PCA cannot be computed. The fix will be available in the next release. Can you edit this temporary file tmpxprjy2ka.WGBS_QC_report_template.Rmd and add eval=FALSE to the PCA Rmd section? It now reads: {r warning=FALSE, message=FALSE} If you change it to: {r warning=FALSE, message=FALSE,eval=FALSE} the code chunk won't be executed and the report creation should run through.

Indeed, if you would like to process multiple files at the same time, it's best to symlink them into one input folder. e.g. ln -t $my_folder_with_symlinks -s $file1 $file2 $file3 ...

Best wishes,

Katarzyna

hanrong498 commented 9 months ago

unrecognised reference name

Hi Hanrong,

what does the header of the bam file coming out of the mapping look like regarding the reference sequences ? samtools view -H $bam | grep @SQ

Best wishes,

Katarzyna

Hi Katarzyna,

Here are the results:

[hhu@jed Sambamba]$ samtools view 211029_1_FZ_WGBS_HOIK_p35_rep1_S1_L001.markdup.bam -H | grep @SQ
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr15    LN:101991189
@SQ SN:chr16    LN:90338345
@SQ SN:chr17    LN:83257441
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrM LN:16569
@SQ SN:GL000008.2   LN:209709
@SQ SN:GL000009.2   LN:201709
@SQ SN:GL000194.1   LN:191469
@SQ SN:GL000195.1   LN:182896
@SQ SN:GL000205.2   LN:185591
@SQ SN:GL000208.1   LN:92689
@SQ SN:GL000213.1   LN:164239
@SQ SN:GL000214.1   LN:137718
@SQ SN:GL000216.2   LN:176608
@SQ SN:GL000218.1   LN:161147
@SQ SN:GL000219.1   LN:179198
@SQ SN:GL000220.1   LN:161802
@SQ SN:GL000221.1   LN:155397
@SQ SN:GL000224.1   LN:179693
@SQ SN:GL000225.1   LN:211173
@SQ SN:GL000226.1   LN:15008
@SQ SN:KI270302.1   LN:2274
@SQ SN:KI270303.1   LN:1942
@SQ SN:KI270304.1   LN:2165
@SQ SN:KI270305.1   LN:1472
@SQ SN:KI270310.1   LN:1201
@SQ SN:KI270311.1   LN:12399
@SQ SN:KI270312.1   LN:998
@SQ SN:KI270315.1   LN:2276
@SQ SN:KI270316.1   LN:1444
@SQ SN:KI270317.1   LN:37690
@SQ SN:KI270320.1   LN:4416
@SQ SN:KI270322.1   LN:21476
@SQ SN:KI270329.1   LN:1040
@SQ SN:KI270330.1   LN:1652
@SQ SN:KI270333.1   LN:2699
@SQ SN:KI270334.1   LN:1368
@SQ SN:KI270335.1   LN:1048
@SQ SN:KI270336.1   LN:1026
@SQ SN:KI270337.1   LN:1121
@SQ SN:KI270338.1   LN:1428
@SQ SN:KI270340.1   LN:1428
@SQ SN:KI270362.1   LN:3530
@SQ SN:KI270363.1   LN:1803
@SQ SN:KI270364.1   LN:2855
@SQ SN:KI270366.1   LN:8320
@SQ SN:KI270371.1   LN:2805
@SQ SN:KI270372.1   LN:1650
@SQ SN:KI270373.1   LN:1451
@SQ SN:KI270374.1   LN:2656
@SQ SN:KI270375.1   LN:2378
@SQ SN:KI270376.1   LN:1136
@SQ SN:KI270378.1   LN:1048
@SQ SN:KI270379.1   LN:1045
@SQ SN:KI270381.1   LN:1930
@SQ SN:KI270382.1   LN:4215
@SQ SN:KI270383.1   LN:1750
@SQ SN:KI270384.1   LN:1658
@SQ SN:KI270385.1   LN:990
@SQ SN:KI270386.1   LN:1788
@SQ SN:KI270387.1   LN:1537
@SQ SN:KI270388.1   LN:1216
@SQ SN:KI270389.1   LN:1298
@SQ SN:KI270390.1   LN:2387
@SQ SN:KI270391.1   LN:1484
@SQ SN:KI270392.1   LN:971
@SQ SN:KI270393.1   LN:1308
@SQ SN:KI270394.1   LN:970
@SQ SN:KI270395.1   LN:1143
@SQ SN:KI270396.1   LN:1880
@SQ SN:KI270411.1   LN:2646
@SQ SN:KI270412.1   LN:1179
@SQ SN:KI270414.1   LN:2489
@SQ SN:KI270417.1   LN:2043
@SQ SN:KI270418.1   LN:2145
@SQ SN:KI270419.1   LN:1029
@SQ SN:KI270420.1   LN:2321
@SQ SN:KI270422.1   LN:1445
@SQ SN:KI270423.1   LN:981
@SQ SN:KI270424.1   LN:2140
@SQ SN:KI270425.1   LN:1884
@SQ SN:KI270429.1   LN:1361
@SQ SN:KI270435.1   LN:92983
@SQ SN:KI270438.1   LN:112505
@SQ SN:KI270442.1   LN:392061
@SQ SN:KI270448.1   LN:7992
@SQ SN:KI270465.1   LN:1774
@SQ SN:KI270466.1   LN:1233
@SQ SN:KI270467.1   LN:3920
@SQ SN:KI270468.1   LN:4055
@SQ SN:KI270507.1   LN:5353
@SQ SN:KI270508.1   LN:1951
@SQ SN:KI270509.1   LN:2318
@SQ SN:KI270510.1   LN:2415
@SQ SN:KI270511.1   LN:8127
@SQ SN:KI270512.1   LN:22689
@SQ SN:KI270515.1   LN:6361
@SQ SN:KI270516.1   LN:1300
@SQ SN:KI270517.1   LN:3253
@SQ SN:KI270518.1   LN:2186
@SQ SN:KI270519.1   LN:138126
@SQ SN:KI270521.1   LN:7642
@SQ SN:KI270522.1   LN:5674
@SQ SN:KI270528.1   LN:2983
@SQ SN:KI270529.1   LN:1899
@SQ SN:KI270530.1   LN:2168
@SQ SN:KI270538.1   LN:91309
@SQ SN:KI270539.1   LN:993
@SQ SN:KI270544.1   LN:1202
@SQ SN:KI270548.1   LN:1599
@SQ SN:KI270579.1   LN:31033
@SQ SN:KI270580.1   LN:1553
@SQ SN:KI270581.1   LN:7046
@SQ SN:KI270582.1   LN:6504
@SQ SN:KI270583.1   LN:1400
@SQ SN:KI270584.1   LN:4513
@SQ SN:KI270587.1   LN:2969
@SQ SN:KI270588.1   LN:6158
@SQ SN:KI270589.1   LN:44474
@SQ SN:KI270590.1   LN:4685
@SQ SN:KI270591.1   LN:5796
@SQ SN:KI270593.1   LN:3041
@SQ SN:KI270706.1   LN:175055
@SQ SN:KI270707.1   LN:32032
@SQ SN:KI270708.1   LN:127682
@SQ SN:KI270709.1   LN:66860
@SQ SN:KI270710.1   LN:40176
@SQ SN:KI270711.1   LN:42210
@SQ SN:KI270712.1   LN:176043
@SQ SN:KI270713.1   LN:40745
@SQ SN:KI270714.1   LN:41717
@SQ SN:KI270715.1   LN:161471
@SQ SN:KI270716.1   LN:153799
@SQ SN:KI270717.1   LN:40062
@SQ SN:KI270718.1   LN:38054
@SQ SN:KI270719.1   LN:176845
@SQ SN:KI270720.1   LN:39050
@SQ SN:KI270721.1   LN:100316
@SQ SN:KI270722.1   LN:194050
@SQ SN:KI270723.1   LN:38115
@SQ SN:KI270724.1   LN:39555
@SQ SN:KI270725.1   LN:172810
@SQ SN:KI270726.1   LN:43739
@SQ SN:KI270727.1   LN:448248
@SQ SN:KI270728.1   LN:1872759
@SQ SN:KI270729.1   LN:280839
@SQ SN:KI270730.1   LN:112551
@SQ SN:KI270731.1   LN:150754
@SQ SN:KI270732.1   LN:41543
@SQ SN:KI270733.1   LN:179772
@SQ SN:KI270734.1   LN:165050
@SQ SN:KI270735.1   LN:42811
@SQ SN:KI270736.1   LN:181920
@SQ SN:KI270737.1   LN:103838
@SQ SN:KI270738.1   LN:99375
@SQ SN:KI270739.1   LN:73985
@SQ SN:KI270740.1   LN:37240
@SQ SN:KI270741.1   LN:157432
@SQ SN:KI270742.1   LN:186739
@SQ SN:KI270743.1   LN:210658
@SQ SN:KI270744.1   LN:168472
@SQ SN:KI270745.1   LN:41891
@SQ SN:KI270746.1   LN:66486
@SQ SN:KI270747.1   LN:198735
@SQ SN:KI270748.1   LN:93321
@SQ SN:KI270749.1   LN:158759
@SQ SN:KI270750.1   LN:148850
@SQ SN:KI270751.1   LN:150742
@SQ SN:KI270752.1   LN:27745
@SQ SN:KI270753.1   LN:62944
@SQ SN:KI270754.1   LN:40191
@SQ SN:KI270755.1   LN:36723
@SQ SN:KI270756.1   LN:79590
@SQ SN:KI270757.1   LN:71251
@SQ SN:J02459.1_spikein LN:48502

Best, Hanrong

hanrong498 commented 9 months ago

The other issue you mention - invalid 'times' argument is because you're running only one sample, and PCA cannot be computed. The fix will be available in the next release. Can you edit this temporary file tmpxprjy2ka.WGBS_QC_report_template.Rmd and add eval=FALSE to the PCA Rmd section? It now reads: {r warning=FALSE, message=FALSE} If you change it to: {r warning=FALSE, message=FALSE,eval=FALSE} the code chunk won't be executed and the report creation should run through.

Hi Katarzyna,

Thank you for the tip! And sorry for my late reply regarding this issue. I cannot find this tmpxprjy2ka.WGBS_QC_report_template.Rmd file.

I tried here /scratch/hhu/20220330_P1886_FIDES_DNA_METHYLATION/merged_processed/.snakemake/scripts/tmpxprjy2ka.WGBS_QC_report_template.Rmd but this /scripts/ folder is empty. Maybe I can change from the source code?

Thanks! Hanrong

katsikora commented 9 months ago

Hi Hanrong,

sure, you can also modify the source file. The updated Rmd is now available on the develop branch of snakePipes .

You should be able to find the "source" Rmd under:

$github_repo_dir/snakePipes/shared/rscripts/WGBS_QC_report_template.Rmd

and replace it e.g. with the one from the develop branch.

Hope this helps, Best,

Katarzyna

katsikora commented 5 months ago

This should now be fixed in snakePipes 2.8.0 and 2.8.1.