loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
197 stars 41 forks source link

multiple conditional xx_footprints.bw data to differential motifs #178

Closed archanabhardwaj closed 1 year ago

archanabhardwaj commented 1 year ago

Hello

I am working on multiple conditional data. I run the ATACorrect and ScoreBigWig for each sample based on the peak and bam files. Now I would like to compare both the condition in order to find the differential motifs.

From manual I found that it takes two input footprints files.

TOBIAS BINDetect --motifs test_data/motifs.jaspar --signals test_data/Bcell_footprints.bw test_data/Tcell_footprints.bw

But In my case I have multiple files from each condition (such as 1_T_footprints.bw; 2_T_footprints.bw; 3_T_footprints.bw; 1_C_footprints.bw; 2_C_footprints.bw; 3_C_footprints.bw)

How should I define six footprints.bw files for the differential motif analysis for group C and T at BINDetect step ?

I downloaded motif jaspar from https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_non-redundant_pfms_meme.txt (link given in https://github.com/loosolab/TOBIAS/issues/8).

I would really appreciate your help !!

maxdudek commented 1 year ago

Hi!

I'm not associated with TOBIAS but it looks like this issue might help. Basically, you could combine your bam files for each condition, then run ATACorrect and ScoreBigWig on each condition so that you only have two footprint files.

Hope this helps!

msbentsen commented 1 year ago

Hi @archanabhardwaj ,

Indeed, I would do it exactly like @maxdudek says! TOBIAS unfortunately does not deal with replicates so you would have to merge them beforehand.

If you want the footprint quantificiation for all samples, you can just give all the bigwigs to TOBIAS BINDetect, as it can take more than two files, like:

TOBIAS BINDetect --signals 1_T_footprints.bw 2_T_footprints.bw 3_T_footprints.bw 1_C_footprints.bw (...)

It will then automatically create pairwise comparisons of all input, e.g. 1_T vs. 2_T, 1_T vs- 3_T, etc.

archanabhardwaj commented 1 year ago

Hello

I merged all bam files based on their group information and rerun the ATACorrect step.

Steps followed for BAM merge :

samtools merge group1_srt_f.bam S1_R1_001.trim.srt.bam S2_R1_001.trim.srt.bam ... so on samtools merge group2_srt_f.bam S3_R1_001.trim.srt.bam S4_R1_001.trim.srt.bam .. so on

Steps used for peak merge :

cat group1_peak1_sam1 group1_peak1_sam2 | sort -k1,1 -k2,2n | bedtools merge -i - > group1_merged.bed

cat group2_peak1_sam1 group2_peak1_sam2 | sort -k1,1 -k2,2n | bedtools merge -i - > group2_merged.bed

and so on....

Running step for ATACorrect : TOBIAS ATACorrect --bam /work_beegfs/usr/usr/B/TOBIAS/P/group1_srt_f.bam --peaks /work/usr/usr/B/TOBIAS/P/group1_merged.bed --genome /workusr/references/iGenomes/references/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa --prefix P_merged_bam_peakf --cores 8

I have submitted multiple jobs. All jobs are stuck at Counting reads in nonpeak regions step after doing calculation at certain percentage.

First Job : (4 BAM files merged in combined1.bam and running in ATACorrect) 2023-01-16 21:41:50 (109234) [INFO] Progress: 93% 2023-01-16 21:42:07 (109234) [INFO] Progress: 94% 2023-01-16 21:43:08 (109234) [INFO] Progress: 95%

Second job : (4 BAM files merged in combined2.bam and running in ATACorrect)

2023-01-16 19:41:33 (244318) [INFO] Progress: 87% 2023-01-16 19:42:20 (244318) [INFO] Progress: 88% 2023-01-16 19:42:40 (244318) [INFO] Progress: 89% 2023-01-16 19:42:51 (244318) [INFO] Progress: 90% 2023-01-16 19:43:14 (244318) [INFO] Progress: 91% 2023-01-16 19:46:21 (244318) [INFO] Progress: 92% 2023-01-16 19:49:44 (244318) [INFO] Progress: 93% 2023-01-16 19:49:53 (244318) [INFO] Progress: 94%

Third Job (26 BAM files merged in combined3.bam and running in ATACorrect ) : 2023-01-16 17:22:16 (159592) [INFO] Progress: 3% 2023-01-16 17:36:05 (159592) [INFO] Progress: 4% 2023-01-16 17:51:42 (159592) [INFO] Progress: 5% 2023-01-16 18:07:26 (159592) [INFO] Progress: 6% 2023-01-16 18:28:31 (159592) [INFO] Progress: 7% 2023-01-16 18:51:05 (159592) [INFO] Progress: 8% 2023-01-16 18:59:29 (159592) [INFO] Progress: 9% 2023-01-16 19:00:13 (159592) [INFO] Progress: 10% 2023-01-16 19:00:48 (159592) [INFO] Progress: 11% 2023-01-16 19:01:12 (159592) [INFO] Progress: 12%

I would appreciate all the suggestion. If I am doing something wrong, please correct me.

Thanks alot in advance !!

archanabhardwaj commented 1 year ago

@msbentsen Thanks for your suggestions. I followed both the options.

Option 2 : To detect the footprint quantificiation for all samples, I gave both .bw and peak list but it gives error. I assume that it does not handle list of peak files but only list of .bw files.

TOBIAS BINDetect --motifs JASPAR2022_CORE_non-redundant_pfms_meme.txt --signals p1_footprints.bw p3_footprints.bw p2_footprints.bw --genome /work/references/iGenomes/references/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa --peaks p1.bfilt.narrowPeak p2.bfilt.narrowPeak p3.bfilt.narrowPeak --outdir BINDtest --cores 8

Should I merge all peak into single bed file for this analysis ? Would it be correct ?

Thanks in advance

msbentsen commented 1 year ago

Should I merge all peak into single bed file for this analysis ? Would it be correct ?

Yes exactly, the --peaks option only takes one file, so you should merge p1.bfilt.narrowPeak p2.bfilt.narrowPeak p3.bfilt.narrowPeak to a "merged_peaks.bed" file. Be aware that this also is the file you need to use when running ATACorrect and ScoreBigwig, as this ensures that you have scores for all regions.

archanabhardwaj commented 1 year ago

@msbentsen
Thanks. I have merged peaks and bam files as well. Also, I tried to run merged bam files with peaks files in ATACorrect. From last two days, its stuck at

2023-01-20 11:19:28 (51947) [INFO] Progress: 89% 2023-01-20 11:19:33 (51947) [INFO] Progress: 90% 2023-01-20 11:19:56 (51947) [INFO] Progress: 91% 2023-01-20 11:20:43 (51947) [INFO] Progress: 92% 2023-01-20 11:22:43 (51947) [INFO] Progress: 93% 2023-01-20 11:22:47 (51947) [INFO] Progress: 94% 2023-01-20 11:23:14 (51947) [INFO] Progress: 95% 2023-01-20 11:23:20 (51947) [INFO] Progress: 96% 2023-01-20 11:24:18 (51947) [INFO] Progress: 97% 2023-01-20 11:25:01 (51947) [INFO] Progress: 98%

Would you please suggest me what we can do to speed up the analysis ? I am facing this error at every step of tobias pipeline.

It would be huge help if you can help me to fix this issue. I would appreciate all the suggestions.

msbentsen commented 1 year ago

Would you please suggest me what we can do to speed up the analysis ? I am facing this error at every step of tobias pipeline.

Sorry, I missed this in the comment above. A reason might be large amount of counts within blacklisted regions. These are regions which are known to be high signal or otherwise error prone. I see that you are working with human data, so I would add --blacklist human_blacklist.bed to the ATACorrect call.

You can obtain the lists here: https://github.com/Boyle-Lab/Blacklist/tree/master/lists

archanabhardwaj commented 1 year ago

Hello Based on your suggestion, I used --blacklist option with ATACorrect step. Everything worked fine. But I am getting error at FootprintScores step.

TOBIAS FootprintScores --signal test_corrected.bw (from atacorrect step + --blacklist) --regio ns merged.bed --output test_footprints.bw --cores 8

2023-01-27 10:43:23 (126630) [INFO] Processing input files 2023-01-27 10:43:23 (126630) [INFO] - Opening input cutsite bigwig 2023-01-27 10:43:23 (126630) [INFO] - Getting output regions ready 2023-01-27 10:43:29 (126630) [INFO] Calculating footprints in regions... 2023-01-27 10:43:30 (126630) [INFO] Progress 0% 2023-01-27 10:43:30 (126630) [INFO] Progress 1.0% 2023-01-27 10:43:30 (126630) [INFO] Progress 3.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 4.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 9.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 11.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 14.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 18.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 22.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 25.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 29.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 32.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 36.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 40.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 43.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 67.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 71.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 75.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 78.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 81.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 85.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 89.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 92.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 96.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 100.0% 2023-01-27 10:43:33 (126630) [INFO] Progress done! Problem in main logger process: Traceback (most recent call last): File "/home/.conda/envs/tobias/lib/python3.7/site-packages/tobias/utils/logger.py", line 147, in main_l ogger_process record = self.queue.get() File "", line 2, in get File "/home/.conda/envs/tobias/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod kind, result = conn.recv() File "/home/.conda/envs/tobias/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/.conda/envs/tobias/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/.conda/envs/tobias/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/home.conda/envs/tobias/lib/python3.7/site-packages/tobias/footprinting/scorebigwig.py", line 11 6, in calculate_scores scores = tobias_footprint_array(signal, args.flank_min, args.flank_max, args.fp_min, args.fp_max) #n umpy array TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64) --More--(80%)

I double checked all my inputs and I did not find any issue with input files.I would appreciate all the suggestions to fix this issue.

Thanks in advance

msbentsen commented 1 year ago

Hi @archanabhardwaj ,

Can you provide me with the full log file of the run? (which also states the TOBIAS version in the top). Also, can you try running the TOBIAS example command (https://github.com/loosolab/TOBIAS/wiki/ScoreBigwig#example-command) for the tool? Then we can rule out whether it is a data or installation-related issue, thanks!

archanabhardwaj commented 1 year ago

Hello Here is complete log

TOBIAS 0.8.0 ScoreBigwig (run started 2023-01-27 10:43:23.536742) Working directory: /work_beegfsusrsukmb516/B-CELL/TOBIAS Command line call: /homeusr.conda/envs/tobias/bin/TOBIAS FootprintScores --signal test_merged_bam_peak_26jan_corrected.bw --regions /work_beegfsusrsukmb516/B-CELL/TOBIAS/test/test_merged.bed --output test_merged_bam_peakf_corrected_footprints.bw --cores 8

----- Input parameters ----- signal: test_merged_bam_peak_26jan_corrected.bw output: test_merged_bam_peakf_corrected_footprints.bw regions: /work_beegfsusrsukmb516/B-CELL/TOBIAS/test/test_merged.bed score: footprint absolute: False extend: 100 smooth: 1 min_limit: None max_limit: None fp_min: 20 fp_max: 50 flank_min: 10 flank_max: 30 window: 100 cores: 8 split: 100 verbosity: 3

----- Output files ----- test_merged_bam_peakf_corrected_footprints.bw

2023-01-27 10:43:23 (126630) [INFO] Processing input files 2023-01-27 10:43:23 (126630) [INFO] - Opening input cutsite bigwig 2023-01-27 10:43:23 (126630) [INFO] - Getting output regions ready 2023-01-27 10:43:29 (126630) [INFO] Calculating footprints in regions... 2023-01-27 10:43:30 (126630) [INFO] Progress 0% 2023-01-27 10:43:30 (126630) [INFO] Progress 1.0% 2023-01-27 10:43:30 (126630) [INFO] Progress 3.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 4.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 9.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 11.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 14.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 18.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 22.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 25.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 29.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 32.0% 2023-01-27 10:43:31 (126630) [INFO] Progress 36.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 40.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 43.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 46.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 50.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 53.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 57.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 61.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 64.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 67.0% 2023-01-27 10:43:32 (126630) [INFO] Progress 71.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 75.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 78.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 81.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 85.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 89.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 92.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 96.0% 2023-01-27 10:43:33 (126630) [INFO] Progress 100.0% 2023-01-27 10:43:33 (126630) [INFO] Progress done! Problem in main logger process: Traceback (most recent call last): File "/homeusr.conda/envs/tobias/lib/python3.7/site-packages/tobias/utils/logger.py", line 147, in main_logger_process record = self.queue.get() File "", line 2, in get File "/homeusr.conda/envs/tobias/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod kind, result = conn.recv() File "/homeusr.conda/envs/tobias/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/homeusr.conda/envs/tobias/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/homeusr.conda/envs/tobias/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/homeusr.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/homeusr.conda/envs/tobias/lib/python3.7/site-packages/tobias/footprinting/scorebigwig.py", line 116, in calculate_scores scores = tobias_footprint_array(signal, args.flank_min, args.flank_max, args.fp_min, args.fp_max) #numpy array TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/homeusr.conda/envs/tobias/bin/TOBIAS", line 11, in load_entry_point('tobias==0.8.0', 'console_scripts', 'TOBIAS')() File "/homeusr.conda/envs/tobias/lib/python3.7/site-packages/tobias/TOBIAS.py", line 152, in main args.func(args) #run specified function with arguments File "/homeusr.conda/envs/tobias/lib/python3.7/site-packages/tobias/footprinting/scorebigwig.py", line 239, in run_scorebigwig results = [task.get() for task in task_list] File "/homeusr.conda/envs/tobias/lib/python3.7/site-packages/tobias/footprinting/scorebigwig.py", line 239, in results = [task.get() for task in task_list] File "/homeusr.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64)

I used FootprintScores step on one of the sample, it was working without any issue. Based on your suggestion, I used --blacklist human_blacklist.bed to the ATACorrected step and used as input for the FootprintScores which results in above error. I checked all my input files for proper path, its all correct. I would appreciate all the suggestions.

archanabhardwaj commented 1 year ago

Hello
I found that its problem of input file. For same peak file, I am getting output from ATACorrect step without any error. But it gives error for the FootprintScores for one of the sample.

msbentsen commented 1 year ago

Hi @archanabhardwaj ,

If it is only with one of the samples, can you make sure that nothing went wrong when calculating the ATACorrect step? I am wondering if the input file got corrupted somehow.

Another point is that you can update your TOBIAS using pip install tobias==0.15.1. That might also solve the issue.

archanabhardwaj commented 1 year ago

Hello @msbentsen

I did not get any error ATACorrect step. Also I updated tobias version to 0.15.1 as you can find below.

For trouble shooting, I recreated my bam and bed files just to be sure if something went wrong with either bam or bed files. Still, I did not get any error at ATACorrect step. But for FootprintScores, still I am getting same error for one sample.

TOBIAS 0.15.1 ScoreBigwig (run started 2023-02-07 09:36:32.515102) Working directory: /work_beegfs/usr/usr/B-CELL/TOBIAS Command line call: TOBIAS FootprintScores --signal test_corrected.bw --regions /work_beegfs/usr/usr/B-CELL/TOBIAS/group/test_merged2.bed --output test_corrected_footprints.bw --cores 8

----- Input parameters ----- signal: test_corrected.bw output: test_corrected_footprints.bw regions: /work_beegfs/usr/usr/B-CELL/TOBIAS/group/test_merged2.bed score: footprint absolute: False extend: 100 smooth: 1 min_limit: None max_limit: None fp_min: 20 fp_max: 50 flank_min: 10 flank_max: 30 window: 100 cores: 8 split: 100 verbosity: 3

----- Output files ----- test_corrected_footprints.bw

2023-02-07 09:36:32 (25346) [INFO] Processing input files 2023-02-07 09:36:32 (25346) [INFO] - Opening input cutsite bigwig 2023-02-07 09:36:32 (25346) [INFO] - Getting output regions ready

2023-02-07 09:36:45 (25346) [INFO] Calculating footprints in regions... 2023-02-07 09:36:47 (25346) [INFO] Progress 0% 2023-02-07 09:36:47 (25346) [INFO] Progress 12.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 14.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 25.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 38.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 50.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 61.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 62.0% 2023-02-07 09:36:47 (25346) [INFO] Progress 71.0% 2023-02-07 09:36:48 (25346) [INFO] Progress 82.0% 2023-02-07 09:36:48 (25346) [INFO] Progress 83.0% 2023-02-07 09:36:48 (25346) [INFO] Progress 95.0% 2023-02-07 09:36:48 (25346) [INFO] Progress 96.0% 2023-02-07 09:36:48 (25346) [INFO] Progress 100.0% 2023-02-07 09:36:48 (25346) [INFO] Progress done! multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/usr/.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/tools/score_bigwig.py", line 74, in calculate_scores scores = tobias_footprint_array(signal, args.flank_min, args.flank_max, args.fp_min, args.fp_max) numpy array TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/usr/.conda/envs/tobias/bin/TOBIAS", line 8, in sys.exit(main()) File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/TOBIAS.py", line 154, in main args.func(args)
File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/tools/score_bigwig.py", line 204, in run_scorebigwig results = [task.get() for task in task_list] File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/tools/score_bigwig.py", line 204, in results = [task.get() for task in task_list] File "/home/usr/.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64) 2023-02-07 09:36:48 (25361) [ERROR] Multiprocessing logger lost connection to queue - probably due to an error raised from a child process.

I am sharing output with verbosity 4 as well.

TOBIAS 0.15.1 ScoreBigwig (run started 2023-02-07 09:41:57.617088) Working directory: /work_beegfs/usr/usr/B-CELL/TOBIAS Command line call: TOBIAS FootprintScores --signal test_corrected.bw --regions /work_beegfs/usr/usr/B-CELL/TOBIAS/group/test_merged2.bed --output test_corrected_footprints.bw --cores 8 --verbosity 4

----- Input parameters ----- signal: test_corrected.bw output: test_corrected_footprints.bw regions: /work_beegfs/usr/usr/B-CELL/TOBIAS/group/test_merged2.bed score: footprint absolute: False extend: 100 smooth: 1 min_limit: None max_limit: None fp_min: 20 fp_max: 50 flank_min: 10 flank_max: 30 window: 100 cores: 8 split: 100 verbosity: 4

----- Output files ----- test_corrected_footprints.bw

2023-02-07 09:41:57 (25861) [DEBUG] Setting up listener for log 2023-02-07 09:41:57 (25861) [DEBUG] Starting logger queue for multiprocessing 2023-02-07 09:41:57 (25861) [INFO] Processing input files 2023-02-07 09:41:57 (25861) [INFO] - Opening input cutsite bigwig 2023-02-07 09:41:57 (25861) [DEBUG] Chromosome lengths from input bigwig: {'chr1': 248956422, 'chr2': 242193529, 'chr3': 198295559, 'chr4': 190214555, 'chr5': 181538259, 'chr6': 170805979, 'chr7': 159345973, 'chr8': 145138636, 'chr9': 138394717, 'chr10': 133797422, 'chr11': 135086622, 'chr12': 133275309, 'chr13': 114364328, 'chr14': 107043718, 'chr15': 101991189, 'chr16': 90338345, 'chr17': 83257441, 'chr18': 80373285, 'chr19': 58617616, 'chr20': 64444167, 'chr21': 46709983, 'chr22': 50818468, 'chrX': 156040895, 'chrY': 57227415, 'chr1_KI270706v1_random': 175055, 'chr1_KI270707v1_random': 32032, 'chr1_KI270708v1_random': 127682, 'chr1_KI270709v1_random': 66860, 'chr1_KI270710v1_random': 40176, 'chr1_KI270711v1_random': 42210, 'chr1_KI270712v1_random': 176043, 'chr1_KI270713v1_random': 40745, 'chr1_KI270714v1_random': 41717, 'chr2_KI270715v1_random': 161471, 'chr2_KI270716v1_random': 153799, 'chr3_GL000221v1_random': 155397, 'chr4_GL000008v2_random': 209709, 'chr5_GL000208v1_random': 92689, 'chr9_KI270717v1_random': 40062, 'chr9_KI270718v1_random': 38054, 'chr9_KI270719v1_random': 176845, 'chr9_KI270720v1_random': 39050, 'chr11_KI270721v1_random': 100316, 'chr14_GL000009v2_random': 201709, 'chr14_GL000225v1_random': 211173, 'chr14_KI270722v1_random': 194050, 'chr14_GL000194v1_random': 191469, 'chr14_KI270723v1_random': 38115, 'chr14_KI270724v1_random': 39555, 'chr14_KI270725v1_random': 172810, 'chr14_KI270726v1_random': 43739, 'chr15_KI270727v1_random': 448248, 'chr16_KI270728v1_random': 1872759, 'chr17_GL000205v2_random': 185591, 'chr17_KI270729v1_random': 280839, 'chr17_KI270730v1_random': 112551, 'chr22_KI270731v1_random': 150754, 'chr22_KI270732v1_random': 41543, 'chr22_KI270733v1_random': 179772, 'chr22_KI270734v1_random': 165050, 'chr22_KI270735v1_random': 42811, 'chr22_KI270736v1_random': 181920, 'chr22_KI270737v1_random': 103838, 'chr22_KI270738v1_random': 99375, 'chr22_KI270739v1_random': 73985, 'chrY_KI270740v1_random': 37240, 'chrUn_KI270302v1': 2274, 'chrUn_KI270304v1': 2165, 'chrUn_KI270303v1': 1942, 'chrUn_KI270305v1': 1472, 'chrUn_KI270322v1': 21476, 'chrUn_KI270320v1': 4416, 'chrUn_KI270310v1': 1201, 'chrUn_KI270316v1': 1444, 'chrUn_KI270315v1': 2276, 'chrUn_KI270312v1': 998, 'chrUn_KI270311v1': 12399, 'chrUn_KI270317v1': 37690, 'chrUn_KI270412v1': 1179, 'chrUn_KI270411v1': 2646, 'chrUn_KI270414v1': 2489, 'chrUn_KI270419v1': 1029, 'chrUn_KI270418v1': 2145, 'chrUn_KI270420v1': 2321, 'chrUn_KI270424v1': 2140, 'chrUn_KI270417v1': 2043, 'chrUn_KI270422v1': 1445, 'chrUn_KI270423v1': 981, 'chrUn_KI270425v1': 1884, 'chrUn_KI270429v1': 1361, 'chrUn_KI270442v1': 392061, 'chrUn_KI270466v1': 1233, 'chrUn_KI270465v1': 1774, 'chrUn_KI270467v1': 3920, 'chrUn_KI270435v1': 92983, 'chrUn_KI270438v1': 112505, 'chrUn_KI270468v1': 4055, 'chrUn_KI270510v1': 2415, 'chrUn_KI270509v1': 2318, 'chrUn_KI270518v1': 2186, 'chrUn_KI270508v1': 1951, 'chrUn_KI270516v1': 1300, 'chrUn_KI270512v1': 22689, 'chrUn_KI270519v1': 138126, 'chrUn_KI270522v1': 5674, 'chrUn_KI270511v1': 8127, 'chrUn_KI270515v1': 6361, 'chrUn_KI270507v1': 5353, 'chrUn_KI270517v1': 3253, 'chrUn_KI270529v1': 1899, 'chrUn_KI270528v1': 2983, 'chrUn_KI270530v1': 2168, 'chrUn_KI270539v1': 993, 'chrUn_KI270538v1': 91309, 'chrUn_KI270544v1': 1202, 'chrUn_KI270548v1': 1599, 'chrUn_KI270583v1': 1400, 'chrUn_KI270587v1': 2969, 'chrUn_KI270580v1': 1553, 'chrUn_KI270581v1': 7046, 'chrUn_KI270579v1': 31033, 'chrUn_KI270589v1': 44474, 'chrUn_KI270590v1': 4685, 'chrUn_KI270584v1': 4513, 'chrUn_KI270582v1': 6504, 'chrUn_KI270588v1': 6158, 'chrUn_KI270593v1': 3041, 'chrUn_KI270591v1': 5796, 'chrUn_KI270330v1': 1652, 'chrUn_KI270329v1': 1040, 'chrUn_KI270334v1': 1368, 'chrUn_KI270333v1': 2699, 'chrUn_KI270335v1': 1048, 'chrUn_KI270338v1': 1428, 'chrUn_KI270340v1': 1428, 'chrUn_KI270336v1': 1026, 'chrUn_KI270337v1': 1121, 'chrUn_KI270363v1': 1803, 'chrUn_KI270364v1': 2855, 'chrUn_KI270362v1': 3530, 'chrUn_KI270366v1': 8320, 'chrUn_KI270378v1': 1048, 'chrUn_KI270379v1': 1045, 'chrUn_KI270389v1': 1298, 'chrUn_KI270390v1': 2387, 'chrUn_KI270387v1': 1537, 'chrUn_KI270395v1': 1143, 'chrUn_KI270396v1': 1880, 'chrUn_KI270388v1': 1216, 'chrUn_KI270394v1': 970, 'chrUn_KI270386v1': 1788, 'chrUn_KI270391v1': 1484, 'chrUn_KI270383v1': 1750, 'chrUn_KI270393v1': 1308, 'chrUn_KI270384v1': 1658, 'chrUn_KI270392v1': 971, 'chrUn_KI270381v1': 1930, 'chrUn_KI270385v1': 990, 'chrUn_KI270382v1': 4215, 'chrUn_KI270376v1': 1136, 'chrUn_KI270374v1': 2656, 'chrUn_KI270372v1': 1650, 'chrUn_KI270373v1': 1451, 'chrUn_KI270375v1': 2378, 'chrUn_KI270371v1': 2805, 'chrUn_KI270448v1': 7992, 'chrUn_KI270521v1': 7642, 'chrUn_GL000195v1': 182896, 'chrUn_GL000219v1': 179198, 'chrUn_GL000220v1': 161802, 'chrUn_GL000224v1': 179693, 'chrUn_KI270741v1': 157432, 'chrUn_GL000226v1': 15008, 'chrUn_GL000213v1': 164239, 'chrUn_KI270743v1': 210658, 'chrUn_KI270744v1': 168472, 'chrUn_KI270745v1': 41891, 'chrUn_KI270746v1': 66486, 'chrUn_KI270747v1': 198735, 'chrUn_KI270748v1': 93321, 'chrUn_KI270749v1': 158759, 'chrUn_KI270750v1': 148850, 'chrUn_KI270751v1': 150742, 'chrUn_KI270752v1': 27745, 'chrUn_KI270753v1': 62944, 'chrUn_KI270754v1': 40191, 'chrUn_KI270755v1': 36723, 'chrUn_KI270756v1': 79590, 'chrUn_KI270757v1': 71251, 'chrUn_GL000214v1': 137718, 'chrUn_KI270742v1': 186739, 'chrUn_GL000216v2': 176608, 'chrUn_GL000218v1': 161147, 'chrEBV': 171823} 2023-02-07 09:41:57 (25861) [INFO] - Getting output regions ready 2023-02-07 09:41:57 (25876) [DEBUG] Started main logger process 2023-02-07 09:42:10 (25861) [INFO] Calculating footprints in regions... 2023-02-07 09:42:10 (25861) [DEBUG] Worker cores: 7 2023-02-07 09:42:10 (25861) [DEBUG] Writer cores: 1 2023-02-07 09:42:12 (25861) [INFO] Progress 0% 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr1 10275 10718 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr1 19374111 19374803 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr1 34339323 34339673 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr1 75205971 75206321 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr1 173085872 173086222 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr1 157739688 157740511 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr1 118660794 118661144 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr1 54194673 54195023 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr1 105802749 105803284 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr1 187720520 187720870 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr1 199924965 199925315 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr1 217029941 217030293 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr1 233245244 233245709 2023-02-07 09:42:12 (25861) [INFO] Progress 13.0% 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr1 248105286 248105636 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr10 11144579 11145599 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr10 22948934 22949284 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr10 35053564 35053914 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr10 52487157 52487507 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr10 65124551 65124901 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr10 78561408 78561758 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr10 91746094 91746444 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr10 104851583 104851933 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr10 116888540 116888890 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr10 128964004 128964354 2023-02-07 09:42:12 (25861) [INFO] Progress 24.0% 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr11 9672647 9672997 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr11 22612101 22612584 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr11 36163535 36164297 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr11 49387095 49387694 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr11 67797642 67797992 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr11 81290089 81290439 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr11 94152216 94152919 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr11 107413061 107413411 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr11 122989291 122989642 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr12 4450412 4450762 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr12 18172075 18172654 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr12 30040414 30040764 2023-02-07 09:42:12 (25861) [INFO] Progress 35.0% 2023-02-07 09:42:12 (25861) [INFO] Progress 36.0% 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr12 45350597 45351234 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr12 60428261 60428611 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr12 72963062 72963412 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr12 86549290 86549640 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr12 99799906 99800256 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr12 112745544 112745894 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr12 125252747 125253097 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr13 23336392 23336742 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr13 37326793 37327149 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr13 51322842 51323192 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr13 66539927 66540277 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr13 80785950 80786483 2023-02-07 09:42:12 (25861) [INFO] Progress 47.0% 2023-02-07 09:42:12 (25861) [INFO] Progress 48.0% 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr13 100869077 100869427 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr13 114128638 114128988 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr14 42128436 42128786 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr14 64526257 64526607 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr14 85246423 85246846 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr15 19990746 19991096 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr15 41429418 41429915 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr15 61940433 61940874 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr15 84922519 84922869 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr16 4134286 4134636 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr16 25298184 25298534 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr16 58959335 58959685 2023-02-07 09:42:12 (25861) [INFO] Progress 59.0% 2023-02-07 09:42:12 (25861) [INFO] Progress 60.0% 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr16 78636324 78636674 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr17 10176289 10176821 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr17 35237362 35237778 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr17 56666504 56667126 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr18 21256009 21256493 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr19 4939427 4939777 2023-02-07 09:42:12 (25913) [DEBUG] Calculating scores for region: chr19 46147792 46148276 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr2 21266251 21266601 2023-02-07 09:42:12 (25907) [DEBUG] Calculating scores for region: chr2 56292556 56292906 2023-02-07 09:42:12 (25928) [DEBUG] Calculating scores for region: chr2 102452831 102453181 2023-02-07 09:42:12 (25925) [DEBUG] Calculating scores for region: chr2 144234131 144234481 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr2 189187173 189187588 2023-02-07 09:42:12 (25861) [INFO] Progress 72.0% 2023-02-07 09:42:12 (25916) [DEBUG] Calculating scores for region: chr2 231957952 231958331 2023-02-07 09:42:12 (25910) [DEBUG] Calculating scores for region: chr20 34294913 34295263 2023-02-07 09:42:12 (25922) [DEBUG] Calculating scores for region: chr21 22032827 22033256 2023-02-07 09:42:12 (25919) [DEBUG] Calculating scores for region: chr22 37492822 37493172 2023-02-07 09:42:13 (25907) [DEBUG] Calculating scores for region: chr3 30211564 30212060 2023-02-07 09:42:13 (25928) [DEBUG] Calculating scores for region: chr3 77444932 77445334 2023-02-07 09:42:13 (25925) [DEBUG] Calculating scores for region: chr3 137738676 137739155 2023-02-07 09:42:13 (25913) [DEBUG] Calculating scores for region: chr3 187705154 187705504 2023-02-07 09:42:13 (25916) [DEBUG] Calculating scores for region: chr4 68355849 68356199 2023-02-07 09:42:13 (25910) [DEBUG] Calculating scores for region: chr4 141348186 141348654 2023-02-07 09:42:13 (25922) [DEBUG] Calculating scores for region: chr5 16294607 16295049 2023-02-07 09:42:13 (25919) [DEBUG] Calculating scores for region: chr5 83927170 83927531 2023-02-07 09:42:13 (25861) [INFO] Progress 84.0% 2023-02-07 09:42:13 (25907) [DEBUG] Calculating scores for region: chr5 143455964 143456314 2023-02-07 09:42:13 (25928) [DEBUG] Calculating scores for region: chr6 2637060 2637618 2023-02-07 09:42:13 (25925) [DEBUG] Calculating scores for region: chr6 44289637 44290002 2023-02-07 09:42:13 (25913) [DEBUG] Calculating scores for region: chr6 93262735 93263190 2023-02-07 09:42:13 (25916) [DEBUG] Calculating scores for region: chr6 142506708 142507656 2023-02-07 09:42:13 (25910) [DEBUG] Calculating scores for region: chr7 13678253 13678713 2023-02-07 09:42:13 (25922) [DEBUG] Calculating scores for region: chr7 68868894 68869244 2023-02-07 09:42:13 (25919) [DEBUG] Calculating scores for region: chr7 110931464 110932097 2023-02-07 09:42:13 (25907) [DEBUG] Calculating scores for region: chr7 153770543 153770941 2023-02-07 09:42:13 (25928) [DEBUG] Calculating scores for region: chr8 41956501 41957580 2023-02-07 09:42:13 (25925) [DEBUG] Calculating scores for region: chr8 98363910 98364263 2023-02-07 09:42:13 (25913) [DEBUG] Calculating scores for region: chr8 136071170 136071522 2023-02-07 09:42:13 (25916) [DEBUG] Calculating scores for region: chr9 33751957 33752524 2023-02-07 09:42:13 (25861) [INFO] Progress 96.0% 2023-02-07 09:42:13 (25861) [INFO] Progress 97.0% 2023-02-07 09:42:13 (25910) [DEBUG] Calculating scores for region: chr9 113806543 113806913 2023-02-07 09:42:13 (25922) [DEBUG] Calculating scores for region: chrX 13487708 13488545 2023-02-07 09:42:13 (25919) [DEBUG] Calculating scores for region: chrX 84099199 84099691 2023-02-07 09:42:13 (25861) [INFO] Progress 100.0% 2023-02-07 09:42:13 (25861) [INFO] Progress done! multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/usr/.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/tools/score_bigwig.py", line 74, in calculate_scores scores = tobias_footprint_array(signal, args.flank_min, args.flank_max, args.fp_min, args.fp_max) numpy array TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/usr/.conda/envs/tobias/bin/TOBIAS", line 8, in sys.exit(main()) File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/TOBIAS.py", line 154, in main args.func(args)
File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/tools/score_bigwig.py", line 204, in run_scorebigwig results = [task.get() for task in task_list] File "/home/usr/.conda/envs/tobias/lib/python3.7/site-packages/tobias/tools/score_bigwig.py", line 204, in results = [task.get() for task in task_list] File "/home/usr/.conda/envs/tobias/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value TypeError: Argument 'arr' has incorrect type (expected numpy.ndarray, got numpy.float64) 2023-02-07 09:42:13 (25876) [ERROR] Multiprocessing logger lost connection to queue - probably due to an error raised from a child process.

I would really appreciate your suggestions and help to fix this issue.

Thanks in advance

msbentsen commented 1 year ago

Hi @archanabhardwaj ,

Thank you for the log. Is it possible for you to share the test_corrected.bw and /work_beegfs/usr/usr/B-CELL/TOBIAS/group/test_merged2.bed files with me somehow? Maybe over a filesharing service?

Since it is only for one sample, it is difficult to debug without seeing the data. Thank you.

archanabhardwaj commented 1 year ago

Hello I managed to run the samples and have output from bindetect steps. Issue was in my peak list. To create-network, as we need to create motif2genemapping.txt. I have two options :

I found list of TFs from http://humantfs.ccbr.utoronto.ca/download.php database. Would it be fine I use these TFs list for my network analysis ?

I will format above mentioned file to two column data as required .

TFs -> geneids

Another option might be to create it based on JASPAR motif data ?

I am really confused at this step. Please let me know which option would be accurate.

If I am doing anything wrong, please correct me.

Thanks in advance

msbentsen commented 1 year ago

Hi @archanabhardwaj ,

Yes, any TF list is fine as long as it is in JASPAR/MEME format for TOBIAS to read it. After you have run BINDetect for getting the TFBS prediction, you can go on with CreateNetwork. For this, you will then need the motif2gene-mapping file, which should contain two columns with the same names as in the motif file, and the corresponding gene ids for the motifs/TFs.

I see that the link you sent contains some files of TF names and gene ids, maybe these are useful? image

Let me know in case you can't get it working.

archanabhardwaj commented 1 year ago

Hello

I have created network based on the TF list mentioned http://humantfs.ccbr.utoronto.ca/download.php But for some of my top ranked motifs are missing in above list such as CAMTA3 but its one of the significant TFs down regulated in one of my comparisons. How should I handle those missing TFs ?

Looking forward to hear from you

Thanks in advance

msbentsen commented 1 year ago

Hi @archanabhardwaj,

You can add additional TFs to the motif2gene-mapping file by finding out the gene id for each TF. So for example for CAMTA3, you would add a line with CAMTA3 and the corresponding gene id. However, beware that not all TF motifs from JASPAR are from TFs in human - it might also be from other organisms. In that case, it is valid that CAMTA3 is not included in the motif2gene mapping.