loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
180 stars 37 forks source link

TOBIAS ATACorrect fails on some datasets, without any error message #235

Open liz-is opened 9 months ago

liz-is commented 9 months ago

Hello,

I'm running TOBIAS on aggregated data from different single-cell ATAC-seq clusters. The ATACorrect step fails on a subset of clusters without any error message, but works on others. It consistently fails at the same step when re-run. I was initally using version 0.13.3 but upgraded to 0.16.0 and I'm still getting the same issue.

Your help figuring out the issue would be much appreciated! Please let me know if there's any other info I can provide to help debug this. I've included a portion of the log file (with --verbosity 4) below.

Thanks in advance for your help!

# TOBIAS 0.16.0 ATACorrect (run started 2023-09-29 16:26:47.467766)
# Working directory: /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias
# Command line call: TOBIAS ATACorrect --bam data/bam/Normal_integrated_with_ambRNAremoval_cluster_18_merged.bam --peaks data/Normal_integrated_with_ambRNAremoval_unionpeaks.bed --genome 
/home/research/vaquerizas/jbhaska/Sperm_Project/multiome/pre-processing/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa --blacklist data/blacklist_hg38.bed --outdir data/ATACor
rect/ --cores 8 --verbosity 4

# ----- Input parameters -----
# bam:  data/bam/Normal_integrated_with_ambRNAremoval_cluster_18_merged.bam
# genome:       /home/research/vaquerizas/jbhaska/Sperm_Project/multiome/pre-processing/refdata-cellranger-arc-GRCh38-2020-A-2.0.0/fasta/genome.fa
# peaks:        data/Normal_integrated_with_ambRNAremoval_unionpeaks.bed
# regions_in:   None
# regions_out:  None
# blacklist:    data/blacklist_hg38.bed
# extend:       100
# split_strands:        False
# norm_off:     False
# track_off:    []
# drop_chroms:  ['chrM', 'chrMT', 'M', 'MT', 'Mito']
# k_flank:      12
# read_shift:   [4, -5]
# bg_shift:     100
# window:       100
# score_mat:    DWM
# bias_pkl:     None
# prefix:       Normal_integrated_with_ambRNAremoval_cluster_18_merged
# outdir:       /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect
# cores:        8
# split:        100
# verbosity:    4

# ----- Output files -----
# /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect/Normal_integrated_with_ambRNAremoval_cluster_18_merged_uncorrected.bw
# /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect/Normal_integrated_with_ambRNAremoval_cluster_18_merged_bias.bw
# /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect/Normal_integrated_with_ambRNAremoval_cluster_18_merged_expected.bw
# /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect/Normal_integrated_with_ambRNAremoval_cluster_18_merged_corrected.bw
# /home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect/Normal_integrated_with_ambRNAremoval_cluster_18_merged_atacorrect.pdf

2023-09-29 16:26:47 (366250) [INFO]     ----- Processing input data -----

[snipped]

2023-09-29 16:29:41 (366250) [DEBUG]    Saving bias object to pickle (/home/research/vaquerizas/liz/analysis-3rd-rep/grn/tobias/data/ATACorrect/Normal_in
tegrated_with_ambRNAremoval_cluster_18_merged_AtacBias.pickle)

2023-09-29 16:29:41 (366250) [INFO]     ----- Correcting reads from .bam within output regions -----
2023-09-29 16:29:42 (366250) [DEBUG]    All regions chunked: 203130 ([2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 203
2, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 2032, 1962])
2023-09-29 16:29:42 (366250) [DEBUG]    Worker cores: 7
2023-09-29 16:29:42 (366250) [DEBUG]    Writer cores: 1
2023-09-29 16:29:42 (366250) [DEBUG]    Creating writer queue for ['uncorrected:both', 'bias:both', 'expected:both', 'corrected:both']
msbentsen commented 8 months ago

Hi @liz-is,

Sorry for the delayed response. Can you provide some more information about what system you are on, e.g. macOS, linux? Also, can you describe what it means that it "fails" without error message? Is the program killed or does it keep running without doing anything?

That will help to debug the situation, thanks!

liz-is commented 8 months ago

Hi, sorry for not providing more information! I'm running on Linux (CentOS 7). I'm running TOBIAS through Snakemake, and the Snakemake jobs end without producing any output files apart from the logs as above, so the process seems to be being killed.

msbentsen commented 8 months ago

Hi @liz-is , can you try to run the ATACorrect-command outside of snakemake, but with the same data? If you run snakemake with the "-p" option, it should print you the exact shell commands that are executed, which you can then copy and run separately. That might help us to see whether it is a problem with snakemake or with TOBIAS.

liz-is commented 8 months ago

Hi, after some further investigation, I think I figured it out. The processes ran fine outside of Snakemake/the job submission system, or if I gave the jobs more memory. I didn't think to try giving them more memory at first as these are actually the samples with the lowest number of reads, so I didn't expect it to be an issue. I'm also surprised there was nothing in the logs as usually python would give a "can't allocate memory" error message - is it possible the error is being thrown internally but not logged?