kundajelab / bpnet

Toolkit to train base-resolution deep neural networks on functional genomics data and to interpret them
http://bit.ly/bpnet-colab
MIT License
138 stars 32 forks source link

Run BPnet with one input bw track #18

Open yanwang271 opened 3 years ago

yanwang271 commented 3 years ago

Hi,

I am trying to use bpnet on ATAC-seq data and there is one track for each tasks. When I run bpnet chip-nexus-analysis, I got errors:

0%|          | 0/11 [00:00<?, ?it/s]/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py:130: UserWarning: region counts not present. Returning the default contribution scores
  warnings.warn("region counts not present. Returning the default contribution scores")
100%|██████████| 11/11 [00:35<00:00,  3.20s/it]
  0%|          | 0/11 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/bpnet/bin/bpnet", line 8, in <module>
    sys.exit(main())
  File "/bpnet/lib/python3.6/site-packages/bpnet/__main__.py", line 38, in main
    argh.dispatch(parser)
  File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 905, in chip_nexus_analysis
    footprint_width=footprint_width)
  File "/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 722, in modisco_table
    df = modisco_table(data)
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 199, in modisco_table
    for pattern in tqdm(data.mf.pattern_names())])
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 199, in <listcomp>
    for pattern in tqdm(data.mf.pattern_names())])
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 220, in pattern_features
    for res in pattern_task_features(pattern, task, data)] +
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 220, in <listcomp>
    for res in pattern_task_features(pattern, task, data)] +
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 278, in pattern_task_features
    ("footprint", task_footprint(pattern, task, data)),
  File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 322, in task_footprint
    ("strandcor", correlate(agg_profile_norm[:, 0], agg_profile_norm[::-1, 1]).max()),
IndexError: index 1 is out of bounds for axis 1 with size 1

I thought this is because that there is only one track in agg_profile_norm. To run the programme, I changed the line 322 in bpnet/modisco/table.py to ("strandcor", correlate(agg_profile_norm[:], agg_profile_norm[::-1]).max()). I know doing this will change the "strandcor" value. So, I am wondering whether it is okay to do changes like this, and how much this will affect the motif discovery and the further CWM scan step? Or, is there another way to run the programme with only one bw track?

Thanks in advance for your help!

Best, Yan

akundaje commented 3 years ago

Yan,

Ziga might be able to give you some insights on how to adapt the current repo to support unstranded inputs. But I wanted to give you some additional information.

The ChIP-exo/nexus/seq BPNet architecture is not optimized for ATAC-seq/DNase-seq data. The model needs to be expanded and tweaked a bit to get the best out of it for ATAC-seq/DNase-seq. Also bias correction using enzyme bias tracks is quite important.

Also this particular repo is not being extended to support these additional data types. While the current repo is and will continue to be the primary repo associated with the BPNet paper, we are transitioning to this repo https://github.com/kundajelab/basepairmodels for longer term support and extensions to ATAC-seq, DNase-seq, histone ChIP-seq and many additional post-processing frameworks which are coming soon (within 2 months).

Thanks, Anshul.

On Mon, Feb 1, 2021 at 2:16 PM yanwang271 notifications@github.com wrote:

Hi,

I am trying to use bpnet on ATAC-seq data and there is one track for each tasks. When I run bpnet chip-nexus-analysis, I got errors:

0%| | 0/11 [00:00<?, ?it/s]/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py:130: UserWarning: region counts not present. Returning the default contribution scores

warnings.warn("region counts not present. Returning the default contribution scores")

100%|██████████| 11/11 [00:35<00:00, 3.20s/it]

0%| | 0/11 [00:00<?, ?it/s]Traceback (most recent call last):

File "/bpnet/bin/bpnet", line 8, in

sys.exit(main())

File "/bpnet/lib/python3.6/site-packages/bpnet/main.py", line 38, in main

argh.dispatch(parser)

File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch

for line in lines:

File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command

for line in result:

File "/bpnet/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call

result = function(*positional, **keywords)

File "/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 905, in chip_nexus_analysis

footprint_width=footprint_width)

File "/bpnet/lib/python3.6/site-packages/bpnet/cli/modisco.py", line 722, in modisco_table

df = modisco_table(data)

File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 199, in modisco_table

for pattern in tqdm(data.mf.pattern_names())])

File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 199, in

for pattern in tqdm(data.mf.pattern_names())])

File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 220, in pattern_features

for res in pattern_task_features(pattern, task, data)] +

File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 220, in

for res in pattern_task_features(pattern, task, data)] +

File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 278, in pattern_task_features

("footprint", task_footprint(pattern, task, data)),

File "/bpnet/lib/python3.6/site-packages/bpnet/modisco/table.py", line 322, in task_footprint

("strandcor", correlate(agg_profile_norm[:, 0], agg_profile_norm[::-1, 1]).max()),

IndexError: index 1 is out of bounds for axis 1 with size 1

I thought this is because that there is only one track in agg_profile_norm. To run the programme, I changed the line 322 in bpnet/modisco/table.py to ("strandcor", correlate(agg_profile_norm[:], agg_profile_norm[::-1]).max()). I know doing this will change the "strandcor" value. So, I am wondering whether it is okay to do changes like this, and how much this will affect the motif discovery and the further CWM scan step? Or, is there another way to run the programme with only one bw track?

Thanks in advance for your help!

Best, Yan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/bpnet/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDWEPGWMIL4IGEBVMWA3LS44R5LANCNFSM4W5R4EEA .

Avsecz commented 3 years ago

Hi Yan,

+1 to what Anshul wrote.

It's the plotting code that breaks with single-stranded tracks when trying to run chip-nexus-analysis, not modisco. Doing the fix you are proposing could probably work to generate some plots.

Note that bpnet modisco-run already generates the modisco.html plot containing the motifs. You can use bpnet cwm-scan after modisco-run to get the motif instances. Since the analysis of atac-seq data is pretty custom, i'd recommend starting from the colab: http://bit.ly/bpnet-colab to see how to visualize the motifs from the modisco output files.

yanwang271 commented 3 years ago

Hi @akundaje ,

Thanks for your replay. I really look forward to trying the new version!

yanwang271 commented 3 years ago

Hi @Avsecz ,

Thanks for the remarks! Now I know I can use the cwm-scan directly after modisco-run.