maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
381 stars 85 forks source link

continuation from #924, warnings.warn("fragmentSize.metric.tsv is empty, this sets #926

Open sunta3iouxos opened 1 year ago

sunta3iouxos commented 1 year ago

This is the error I encountered:

ChIP-seq -d /mnt/c/AP01/bamSpikes -j 8 --local --useSpikeInForNorm --getSizeFactorsFrom genome --spikeinExt Chromosome --sampleSheet /mnt/c/AP01/bamSpikes/H3K36me3.tsv --windowSize 500 mm10_gencodeM19_spikes /mnt/c/AP01/bamSpikes/H3K36me3_chip_type.yaml

mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/ChIP-seq/internals.snakefile:71: UserWarning: fragmentSize.metric.tsv is empty, this sets --extsize of MACS2 to an empty string. Fix this and run MACS2 again!
  warnings.warn("fragmentSize.metric.tsv is empty, this sets "

but the file as expected is there and is not empty. Maybe I need to change the path of "filtered_bam/A006200317_201074_S18_L000.filtered.bam"?

this is the entry in **fragmentSize.metric.tsv** in folder /mnt/c/AP01/bamSpikes/deepTools_qc/bamPEFragmentSize ```bash Frag. Sampled Frag. Len. Min. Frag. Len. 1st. Qu. Frag. Len. Mean Frag. Len. Median Frag. Len. 3rd Qu. Frag. Len. Max Frag. Len. Std. Frag. Med. Abs. Dev. Frag. Len. 10% Frag. Len. 20% Frag. Len. 30% Frag. Len. 40% Frag. Len. 60% Frag. Len. 70% Frag. Len. 80% Frag. Len. 90% Frag. Len. 99% Reads Sampled Read Len. Min. Read Len. 1st. Qu. Read Len. Mean Read Len. Median Read Len. 3rd Qu. Read Len. Max Read Len. Std. Read Med. Abs. Dev. Read Len. 10% Read Len. 20% Read Len. 30% Read Len. 40% Read Len. 60% Read Len. 70% Read Len. 80% Read Len. 90% Read Len. 99% filtered_bam/A006200317_201074_S18_L000.filtered.bam 5161291 31.0 232.0 383.59622621549534 347.0 503.0 1000.0 180.41123400583055 134.0 174.0 204.0 279.0 324.0 375.0 462.0 535.0 660.0 879.0 5161291 6.0 101.0 100.82193892962052 101.0 101.0 101.0 2.2590399841527473 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201076_S19_L000.filtered.bam 4998218 31.0 244.0 388.9016327419092 350.0 508.0 1000.0 181.67450743477957 133.0 175.0 210.0 289.0 326.0 381.0 471.0 540.0 666.0 885.0 4998218 5.0 101.0 100.8087454368737 101.0 101.0 101.0 2.36251289170861 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201078_S20_L000.filtered.bam 4842611 31.0 234.0 377.8803847758988 343.0 491.0 1000.0 176.66208868323568 126.0 175.0 206.0 275.0 319.0 370.0 444.0 525.0 646.0 875.0 4842611 6.0 101.0 100.79993437424562 101.0 101.0 101.0 2.450860570721792 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201080_S21_L000.filtered.bam 4231126 31.0 215.0 363.7680144717978 336.0 467.0 1000.0 172.56381732262406 124.0 172.0 196.0 244.0 309.0 361.0 408.0 509.0 617.0 864.0 4231126 9.0 101.0 100.84629339802218 101.0 101.0 101.0 2.0855938850499705 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201082_S22_L000.filtered.bam 4603157 30.0 231.0 381.17343032184215 343.0 498.0 1000.0 181.16802233137997 130.0 174.0 205.0 267.0 316.0 376.0 457.0 533.0 657.0 880.0 4603157 6.0 101.0 100.8004684611018 101.0 101.0 101.0 2.440956246140944 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201084_S23_L000.filtered.bam 5281531 0.0 197.0 357.5763482217562 333.0 474.0 16124.0 176.92660091806692 137.0 166.0 184.0 215.0 300.0 358.0 409.0 513.0 622.0 849.0 5281531 4.0 101.0 100.8138689330802 101.0 101.0 101.0 2.2821954751618945 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201086_S24_L000.filtered.bam 4401065 31.0 212.0 374.26831914547955 351.0 483.0 1000.0 179.37549730915046 138.0 179.0 199.0 231.0 325.0 373.0 406.0 534.0 637.0 890.0 4401065 6.0 101.0 100.82557017449186 101.0 101.0 101.0 2.301871383600218 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201088_S25_L000.filtered.bam 3847192 0.0 207.0 365.88952487944454 346.0 452.0 16170.0 177.7775288643752 136.0 176.0 196.0 222.0 315.0 368.0 398.0 524.0 616.0 883.0 3847192 5.0 101.0 100.81410103784786 101.0 101.0 101.0 2.40888097693432 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201090_S26_L000.filtered.bam 4093720 31.0 232.0 389.4095922535005 361.0 512.0 1000.0 178.40632084526274 137.0 183.0 210.0 289.0 339.0 382.0 433.0 546.0 656.0 892.0 4093720 7.0 101.0 100.81159629872097 101.0 101.0 101.0 2.4316634404585766 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201092_S27_L000.filtered.bam 3741218 0.0 231.0 381.22604536811275 355.0 487.0 1000.0 171.89024706022872 126.0 182.0 209.0 298.0 335.0 375.0 409.0 532.0 626.0 876.0 3741218 7.0 101.0 100.82066749384826 101.0 101.0 101.0 2.336261347282812 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201094_S28_L000.filtered.bam 3698962 31.0 258.0 381.6722956332074 355.0 475.0 1000.0 166.62108808100896 107.0 184.0 218.0 307.0 336.0 375.0 408.0 525.0 612.0 871.0 3698962 7.0 101.0 100.80471115950907 101.0 101.0 101.0 2.4712627909165152 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 filtered_bam/A006200317_201096_S29_L000.filtered.bam 3480613 30.0 229.0 381.9273995701332 358.0 491.0 1000.0 171.77850031757265 130.0 184.0 209.0 289.0 337.0 377.0 413.0 535.0 620.0 874.0 3480613 6.0 101.0 100.8180673921519 101.0 101.0 101.0 2.3709500630069402 0.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 101.0 ```
katsikora commented 1 year ago

Do you have a latency wait value set for snakemake in your cluster_config.yaml ? grep snakemake_latency_wait ChIP-seq.cluster_config.yaml

sunta3iouxos commented 1 year ago

snakemake_latency_wait: 300

katsikora commented 1 year ago

Can you check the split_deepTools_qc/bamPEFragmentSize/host.fragmentSize.metric.tsv file ?

sunta3iouxos commented 1 year ago

Can you check the split_deepTools_qc/bamPEFragmentSize/host.fragmentSize.metric.tsv file ?

there you go:


head /mnt/c/AP01/bamSpikes/split_deepTools_qc/bamPEFragmentSize/host.fragmentSize.metric.tsv

        Frag. Sampled   Frag. Len. Min. Frag. Len. 1st. Qu.     Frag. Len. Mean Frag. Len. Median       Frag. Len. 3rd Qu.  Frag. Len. Max  Frag. Len. Std. Frag. Med. Abs. Dev.    Frag. Len. 10%  Frag. Len. 20%  Frag. Len. 30%      Frag. Len. 40%  Frag. Len. 60%  Frag. Len. 70%  Frag. Len. 80%  Frag. Len. 90%  Frag. Len. 99%      Reads Sampled   Read Len. Min.  Read Len. 1st. Qu.      Read Len. Mean  Read Len. Median        Read Len. 3rd Qu.   Read Len. Max   Read Len. Std.  Read Med. Abs. Dev.     Read Len. 10%   Read Len. 20%   Read Len. 30%       Read Len. 40%   Read Len. 60%   Read Len. 70%   Read Len. 80%   Read Len. 90%   Read Len. 99%
split_bam/A006200317_201074_S18_L000_host.bam   5160663 31.0    232.0   383.611620638666        347.0   503.0       1000.0  180.41513854231857      134.0   174.0   204.0   280.0   324.0   375.0   462.0   535.0   660.0       879.0   5160663 6.0     101.0   100.82191726140614      101.0   101.0   101.0   2.2591765770093377 0.0      101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0
split_bam/A006200317_201076_S19_L000_host.bam   4997644 31.0    244.0   388.91535851693317      350.0   508.0       1000.0  181.6785399018827       133.0   175.0   210.0   289.0   326.0   381.0   471.0   540.0   666.0       885.0   4997644 5.0     101.0   100.80873227464781      101.0   101.0   101.0   2.3626184875566327 0.0      101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0
split_bam/A006200317_201078_S20_L000_host.bam   4841874 31.0    234.0   377.89708922619633      343.0   491.0       1000.0  176.66795904582912      126.0   175.0   206.0   275.0   319.0   370.0   444.0   525.0   646.0       875.0   4841874 6.0     101.0   100.79990392149817      101.0   101.0   101.0   2.4510458479789414 0.0      101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0
split_bam/A006200317_201080_S21_L000_host.bam   4230783 31.0    215.0   363.7763369097399       336.0   467.0       1000.0  172.56697938041304      124.0   172.0   196.0   244.0   309.0   361.0   408.0   509.0   617.0       864.0   4230783 9.0     101.0   100.84628093664932      101.0   101.0   101.0   2.0856779662382343 0.0      101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0
split_bam/A006200317_201082_S22_L000_host.bam   4602454 30.0    231.0   381.1905059778979       343.0   498.0       1000.0  181.17386564276273      130.0   174.0   205.0   267.0   316.0   376.0   457.0   533.0   657.0       880.0   4602454 6.0     101.0   100.8004379837365       101.0   101.0   101.0   2.4411414147128365 0.0      101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0
split_bam/A006200317_201084_S23_L000_host.bam   5281209 0.0     197.0   357.58239921957266      333.0   474.0       16124.0 176.9292762319957       137.0   166.0   184.0   215.0   300.0   358.0   409.0   513.0   622.0       849.0   5281209 4.0     101.0   100.81385985671083      101.0   101.0   101.0   2.2822587968008254 0.0      101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0   101.0
katsikora commented 1 year ago

Hmm, very strange. Can it be that the latency on your system is longer than 5 mins? If you rerun the workflow from where it stopped, does it complete successfully?

sunta3iouxos commented 1 year ago

It finished with success. The whole pipeline did not fail. Just throw those errors at the beginning. I noticed that I did not paste the complete error, so I am attaching here the whole log file:

ChIP-seq_run-2.log

katsikora commented 1 year ago

I see. I'm just wondering if the fragment length from deepTools was passed to MACS2 or not - the warning rather suggests the latter.

sunta3iouxos commented 1 year ago

I see. I'm just wondering if the fragment length from deepTools was passed to MACS2 or not - the warning rather suggests the latter.

Should I increase it to a 10x fold?

sunta3iouxos commented 11 months ago

this error persists:

/mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/ChIP-seq/internals.snakefile:71: UserWarning: fragmentSize.metric.tsv is empty, this sets --extsize of MACS2 to an empty string. Fix this and run MACS2 again!
  warnings.warn("fragmentSize.metric.tsv is empty, this sets "

I am wondering if the pipeline searches for the file in a different position than: /mnt/c/AP04/deepTools_qc/bamPEFragmentSize/fragmentSize.metric.tsv

sunta3iouxos commented 11 months ago

In addition, can I add the --extsize that is expected from MACS2 manually or let MACS2 calculated by using the --peakCallerOptions "" string?

katsikora commented 11 months ago

this error persists:

/mambaforge/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/ChIP-seq/internals.snakefile:71: UserWarning: fragmentSize.metric.tsv is empty, this sets --extsize of MACS2 to an empty string. Fix this and run MACS2 again!
  warnings.warn("fragmentSize.metric.tsv is empty, this sets "

I am wondering if the pipeline searches for the file in a different position than: /mnt/c/AP04/deepTools_qc/bamPEFragmentSize/fragmentSize.metric.tsv

Yes, when --useSpikeInForNorm is passed, split_deepTools_qc/bamPEFragmentSize/host.fragmentSize.metric.tsv is the file that is passed to MACS2 to look up calculated fragment size.

After looking into the code, the check for fragment size tsv is occurring in two places. One of the instances handles the --useSpikeInForNorm argument, and the other doesn't. It looks like the warning is raised erroneously in this case, as the file that the check is looking for is not the one that is required for MACS2 in this case. I'll fix that.

I think that you could ignore the warning. Could you look in the MACS2/logs folder into the head of the .err files ? MACS2 commandline should be printed there. Could you confirm that --extSize was passed together with a sensible numerical value?

Best wishes,

Katarzyna

katsikora commented 11 months ago

In addition, can I add the --extsize that is expected from MACS2 manually or let MACS2 calculated by using the --peakCallerOptions "" string?

That's a bit tricky, as --extsize would be passed once (potentially with an empty value) explicitly through the rule parameters, and than another time through --peakCallerOptions. You could try it and see what happens - it can be that MACS2 errors out (same argument passed twice) or that one value will override the other. In the latter case it would be good to confirm in the logs what the value that was taken was.

Hope this helps, Best, Katarzyna