loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
188 stars 40 forks source link

errors from Tobias #9

Closed xiaohanhe2020 closed 4 years ago

xiaohanhe2020 commented 4 years ago

Hi Mette, I run a code like this TOBIAS BINDetect --motifs /Users/xiaohan/Desktop/diffbind/jaspar.txt --signals /Users/xiaohan/Desktop/diffbind/atacorrect_test/ZR_corrected.bw /Users/xiaohan/Desktop/diffbind/atacorrect_test/CR_corrected.bw --genome /Users/xiaohan/Desktop/peaks/hg19.fa --peaks /Users/xiaohan/Desktop/RNA-seq/final_merge.bed --peak_header /Users/xiaohan/Desktop/RNA-seq/annotated_peaks_header.txt --outdir bindetect_output --cond_names Z R --core 8

I gain errors like this 2020-03-25 15:37:10 (26384) [INFO] Merging results from subsets

2020-03-25 15:37:27 (26384) [INFO] Estimating score distribution per condition 2020-03-25 15:37:29 (26384) [INFO] Normalizing scores 2020-03-25 15:37:29 (26384) [INFO] Estimating bound/unbound threshold Traceback (most recent call last): File "/Users/xiaohan/miniconda3/envs/mypython3/bin/TOBIAS", line 11, in load_entry_point('tobias==0.9.0', 'console_scripts', 'TOBIAS')() File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/tobias/TOBIAS.py", line 154, in main args.func(args) #run specified function with arguments File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/tobias/footprinting/bindetect.py", line 396, in run_bindetect gmm.fit(np.log(bg_values).reshape(-1, 1)) File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/sklearn/mixture/_base.py", line 192, in fit self.fit_predict(X, y) File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/sklearn/mixture/_base.py", line 219, in fit_predict X = _check_X(X, self.n_components, ensure_min_samples=2) File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/sklearn/mixture/_base.py", line 53, in _check_X ensure_min_samples=ensure_min_samples) File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 578, in check_array allow_nan=force_all_finite == 'allow-nan') File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 60, in _assert_all_finite msg_dtype if msg_dtype is not None else X.dtype) ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). Problem in main logger process: Traceback (most recent call last): File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/site-packages/tobias/utils/logger.py", line 147, in main_logger_process record = self.queue.get() File "", line 2, in get File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod kind, result = conn.recv() File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/xiaohan/miniconda3/envs/mypython3/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

Do you know how to fix it

xiaohanhe2020 commented 4 years ago

Hi Mette, I run the code TOBIAS ClusterMotifs. However, it appears like this. TOBIAS: error: argument : invalid choice: 'ClusterMotifs' (choose from 'ATACorrect', 'ScoreBigwig', 'FootprintScores', 'BINDetect', 'TFBScan', 'FormatMotifs', 'ScoreBed', 'PlotAggregate', 'PlotHeatmap', 'PlotChanges', 'PlotTracks', 'MergePDF', 'MaxPos', 'SubsampleBam', 'CreateNetwork', 'Log2Table'

Could you please tell me the reasons?

Best wishes Xiaohan

msbentsen commented 4 years ago

Hi Mette, I run the code TOBIAS ClusterMotifs. However, it appears like this. TOBIAS: error: argument : invalid choice: 'ClusterMotifs' (choose from 'ATACorrect', 'ScoreBigwig', 'FootprintScores', 'BINDetect', 'TFBScan', 'FormatMotifs', 'ScoreBed', 'PlotAggregate', 'PlotHeatmap', 'PlotChanges', 'PlotTracks', 'MergePDF', 'MaxPos', 'SubsampleBam', 'CreateNetwork', 'Log2Table'

Could you please tell me the reasons?

Best wishes Xiaohan

Dear Xiaohan,

ClusterMotifs is only available from TOBIAS version >=0.10.0 (as stated in CHANGES). Please install this version or higher to use the tool.

Best regards, Mette

msbentsen commented 4 years ago

Hi Mette, I run a code like this TOBIAS BINDetect --motifs /Users/xiaohan/Desktop/diffbind/jaspar.txt --signals /Users/xiaohan/Desktop/diffbind/atacorrect_test/ZR_corrected.bw /Users/xiaohan/Desktop/diffbind/atacorrect_test/CR_corrected.bw --genome /Users/xiaohan/Desktop/peaks/hg19.fa --peaks /Users/xiaohan/Desktop/RNA-seq/final_merge.bed --peak_header /Users/xiaohan/Desktop/RNA-seq/annotated_peaks_header.txt --outdir bindetect_output --cond_names Z R --core 8

For your first question, I see that you are using the "_corrected.bw" signals from ATACorrect as input. The BINDetect tool is indented to be used with footprint scores (not Tn5 signal), which you can obtain using TOBIAS ScoreBigwig.

The Tn5 signal is very sparse, so the error is due to BINDetect not being able to properly estimate the background score distribution. Sorry about that - I will build in a better error message in a future release.

Until then, please try again with --signals <footprintscores.bw>.

Best Mette

xiaohanhe2020 commented 4 years ago

Thank you very much for your reply. It works nicely.

May I ask how to do the cluster and how to do the transcription factor network?

Best wishes Xiaohan

On 26 Mar 2020, at 07:43, Mette Bentsen notifications@github.com wrote:



Hi Mette, I run a code like this TOBIAS BINDetect --motifs /Users/xiaohan/Desktop/diffbind/jaspar.txt --signals /Users/xiaohan/Desktop/diffbind/atacorrect_test/ZR_corrected.bw /Users/xiaohan/Desktop/diffbind/atacorrect_test/CR_corrected.bw --genome /Users/xiaohan/Desktop/peaks/hg19.fa --peaks /Users/xiaohan/Desktop/RNA-seq/final_merge.bed --peak_header /Users/xiaohan/Desktop/RNA-seq/annotated_peaks_header.txt --outdir bindetect_output --cond_names Z R --core 8

For your first question, I see that you are using the "_corrected.bw" signals from ATACorrect as input. The BINDetect tool is indented to be used with footprint scores (not Tn5 signal), which you can obtain using TOBIAS ScoreBigwig.

The Tn5 signal is very sparse, so the error is due to BINDetect not being able to properly estimate the background score distribution. Sorry about that - I will build in a better error message in a future release.

Until then, please try again with --signals .

Best Mette

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Floosolab%2FTOBIAS%2Fissues%2F9%23issuecomment-604278265&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C297389770ccc4580f6df08d7d1594f24%7C8370cf1416f34c16b83c724071654356%7C0&sdata=Xj8Xkd1bw32ThV0oG%2BRLpDSFZAxtlNSyvuwK%2B5nIU8o%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAO4D5X2LFJS73OTQQQDKGKTRJMBQHANCNFSM4LTRN3KQ&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C297389770ccc4580f6df08d7d1594f24%7C8370cf1416f34c16b83c724071654356%7C0&sdata=q%2Fm%2BMDJn2U32aZmib8XZ4dJ3VSMS%2BaRnpjlLmxqaAbw%3D&reserved=0.

msbentsen commented 4 years ago

Please check out the wiki for explanations and examples on the individual tools.

xiaohanhe2020 commented 4 years ago

Hi Mette, Thank you for your reply. I run the code and it appears like this. (mypython3) xiaohans-MacBook-Pro:~ xiaohan$ TOBIAS ClusterMotifs TOBIAS: error: argument : invalid choice: 'ClusterMotifs' (choose from 'ATACorrect', 'ScoreBigwig', 'FootprintScores', 'BINDetect', 'TFBScan', 'FormatMotifs', 'ScoreBed', 'PlotAggregate', 'PlotHeatmap', 'PlotChanges', 'PlotTracks', 'MergePDF', 'MaxPos', 'SubsampleBam', 'CreateNetwork', 'Log2Table')

Do you know the reasons?

Best wishes Xh

msbentsen commented 4 years ago

Hi Mette, I run the code TOBIAS ClusterMotifs. However, it appears like this. TOBIAS: error: argument : invalid choice: 'ClusterMotifs' (choose from 'ATACorrect', 'ScoreBigwig', 'FootprintScores', 'BINDetect', 'TFBScan', 'FormatMotifs', 'ScoreBed', 'PlotAggregate', 'PlotHeatmap', 'PlotChanges', 'PlotTracks', 'MergePDF', 'MaxPos', 'SubsampleBam', 'CreateNetwork', 'Log2Table' Could you please tell me the reasons? Best wishes Xiaohan

Dear Xiaohan,

ClusterMotifs is only available from TOBIAS version >=0.10.0 (as stated in CHANGES). Please install this version or higher to use the tool.

Best regards, Mette

Please refer to my previous answer regarding the version of TOBIAS.

xiaohanhe2020 commented 4 years ago

Hi Mette, May I ask what are the relationships between transcription factor footprints, transcription factor activity and transcription regulatory network?

Best wishes Xiaohan

On 26 Mar 2020, at 11:10, He, Xiaohan xiaohan.he@kcl.ac.uk<mailto:xiaohan.he@kcl.ac.uk> wrote:

Thank you very much for your reply. It works nicely.

May I ask how to do the cluster and how to do the transcription factor network?

Best wishes Xiaohan

On 26 Mar 2020, at 07:43, Mette Bentsen notifications@github.com<mailto:notifications@github.com> wrote:



Hi Mette, I run a code like this TOBIAS BINDetect --motifs /Users/xiaohan/Desktop/diffbind/jaspar.txt --signals /Users/xiaohan/Desktop/diffbind/atacorrect_test/ZR_corrected.bw /Users/xiaohan/Desktop/diffbind/atacorrect_test/CR_corrected.bw --genome /Users/xiaohan/Desktop/peaks/hg19.fa --peaks /Users/xiaohan/Desktop/RNA-seq/final_merge.bed --peak_header /Users/xiaohan/Desktop/RNA-seq/annotated_peaks_header.txt --outdir bindetect_output --cond_names Z R --core 8

For your first question, I see that you are using the "_corrected.bw" signals from ATACorrect as input. The BINDetect tool is indented to be used with footprint scores (not Tn5 signal), which you can obtain using TOBIAS ScoreBigwig.

The Tn5 signal is very sparse, so the error is due to BINDetect not being able to properly estimate the background score distribution. Sorry about that - I will build in a better error message in a future release.

Until then, please try again with --signals .

Best Mette

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Floosolab%2FTOBIAS%2Fissues%2F9%23issuecomment-604278265&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C297389770ccc4580f6df08d7d1594f24%7C8370cf1416f34c16b83c724071654356%7C0&sdata=Xj8Xkd1bw32ThV0oG%2BRLpDSFZAxtlNSyvuwK%2B5nIU8o%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAO4D5X2LFJS73OTQQQDKGKTRJMBQHANCNFSM4LTRN3KQ&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C297389770ccc4580f6df08d7d1594f24%7C8370cf1416f34c16b83c724071654356%7C0&sdata=q%2Fm%2BMDJn2U32aZmib8XZ4dJ3VSMS%2BaRnpjlLmxqaAbw%3D&reserved=0.

msbentsen commented 4 years ago

Hi Mette, May I ask what are the relationships between transcription factor footprints, transcription factor activity and transcription regulatory network? Best wishes Xiaohan

I would recommend you to read the preprint article presenting TOBIAS (link here) - it explains some of the relationships between the different terms. But I would generally say:

Transcription factor footprints are the effects of protein binding calculated from the ATAC-seq experiment. These relate to the Transcription factor activity in the way, that it is the activity (meaning the active binding by a transcription factor), which creates the footprints. Transcription factors which are inactive (either not expressed or not bound) leave minimal footprints. A transcription regulatory network brings together the information of transcription factor binding and target genes, to get a view of the global influence of transcription factor binding on the transcriptional program.

Hope this answers your question.

xiaohanhe2020 commented 4 years ago

Hi Mette, Thank you so much for answering my questions. I understood now.

Best wishes Xiaohan

On 14 Apr 2020, at 11:52, Mette Bentsen notifications@github.com<mailto:notifications@github.com> wrote:

Hi Mette, May I ask what are the relationships between transcription factor footprints, transcription factor activity and transcription regulatory network? Best wishes Xiaohan

I would recommend you to read the preprint article presenting TOBIAS (link herehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.biorxiv.org%2Fcontent%2F10.1101%2F869560v2&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C94abc6a3a68f4f86c22e08d7e061fedd%7C8370cf1416f34c16b83c724071654356%7C0&sdata=CzimhTDcrBzX0tNEK1LZIsBXFyGwl7ezuVzsABKm2Vw%3D&reserved=0) - it explains some of the relationships between the different terms. But I would generally say:

Transcription factor footprints are the effects of protein binding calculated from the ATAC-seq experiment. These relate to the Transcription factor activity in the way, that it is the activity (meaning the active binding by a transcription factor), which creates the footprints. Transcription factors which are inactive (either not expressed or not bound) leave minimal footprints. A transcription regulatory network brings together the information of transcription factor binding and target genes, to get a view of the global influence of transcription factor binding on the transcriptional program.

Hope this answers your question.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Floosolab%2FTOBIAS%2Fissues%2F9%23issuecomment-613369373&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C94abc6a3a68f4f86c22e08d7e061fedd%7C8370cf1416f34c16b83c724071654356%7C0&sdata=UoQ1eQJ%2Brx5fI5KSweAK9F0EMYGHnWD34YhfpWISoVc%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAO4D5XYUPMIIEAZQR745OJ3RMQ6ATANCNFSM4LTRN3KQ&data=01%7C01%7Cxiaohan.he%40KCL.ac.uk%7C94abc6a3a68f4f86c22e08d7e061fedd%7C8370cf1416f34c16b83c724071654356%7C0&sdata=gkHLEF9lyG2f%2FBYlonGLEAkoZUFYZwtLxYriDUIkeUA%3D&reserved=0.