kundajelab / tfmodisco

TF MOtif Discovery from Importance SCOres
MIT License
121 stars 29 forks source link

AssertionError when run tf-modisco #117

Open Jie-Lii opened 1 month ago

Jie-Lii commented 1 month ago

Hi, i am trying to use tf-modisco to find motifs, but I encountered the following error during execution:

MEMORY 3.652018176
On task task0
Computing windowed sums on original
Generating null dist
peak(mu)= 0.0071754540205001835
Computing threshold
Subsampling!
For increasing = True , the minimum IR precision was 0.40944084378429907 occurring at 0.0 implying a frac_neg of 0.6933104659793835
To be conservative, adjusted frac neg is 0.95
Thresholds from null dist were -inf  and  8.0625 with frac passing 2e-06
Passing windows frac was 2e-06 , which is below  0.03 ; adjusting
Final raw thresholds are -3.90625  and  3.90625
Final transformed thresholds are -0.9697025572005383  and  0.9697025572005383
saving plot to figures/scoredist_0.png
Got 9863 coords
After resolving overlaps, got 9863 seqlets
Across all tasks, the weakest transformed threshold used was: 0.9696025572005383
MEMORY 3.787350016
9863 identified in total
Traceback (most recent call last):
  File "C:\Users\11435\Desktop\code\2024-07\2024-07-19\tf-modisco玉米5套数据测试\Code\utils.py", line 160, in <module>
    run_modisco(onehot_data=dna_arr[:2000], gradient_data=gradient_arr[:2000])
  File "C:\Users\11435\Desktop\code\2024-07\2024-07-19\tf-modisco玉米5套数据测试\Code\utils.py", line 112, in run_modisco
    tfmodisco_results = modisco.tfmodisco_workflow.workflow.TfModiscoWorkflow(
  File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\tfmodisco_workflow\workflow.py", line 335, in __call__
    metaclustering_results = metaclusterer.fit_transform(seqlets)
  File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 100, in fit_transform
    self.fit(seqlets)
  File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 107, in fit
    self._fit(attribute_vectors)
  File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 306, in _fit
    vector_activity_pattern = self.vector_to_pattern(vector)
  File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 149, in vector_to_pattern
    assert False
AssertionError
[0.9666605]
[0]

I don't know what caused this issue. Here is the code I ran.

contrib_scores = {"task0": onehot_data * gradient_data}
hypothetical_contribs_scores = {"task0": gradient_data}
onehot_data = onehot_data

null_per_pos_scores = modisco.coordproducers.LaplaceNullDist(num_to_samp=1000)
tfmodisco_results = modisco.tfmodisco_workflow.workflow.TfModiscoWorkflow(
        # Slight modifications from the default settings
        sliding_window_size=15,
        flank_size=5,
        target_seqlet_fdr=0.15,
        seqlets_to_patterns_factory=
        modisco.tfmodisco_workflow.seqlets_to_patterns.TfModiscoSeqletsToPatternsFactory(
            # Note: as of version 0.5.6.0, it's possible to use the results of a motif discovery
            # software like MEME to improve the TF-MoDISco clustering. To use the meme-based
            # initialization, you would specify the initclusterer_factory as shown in the
            # commented-out code below:
            # initclusterer_factory=modisco.clusterinit.memeinit.MemeInitClustererFactory(
            #    meme_command="meme", base_outdir="meme_out",
            #    max_num_seqlets_to_use=10000, nmotifs=10, n_jobs=1),
            trim_to_window_size=15,
            initial_flank_to_add=5,
            final_flank_to_add=5,
            final_min_cluster_size=20,
            # use_pynnd=True can be used for faster nn comp at coarse grained step
            # (it will use pynndescent), but note that pynndescent may crash
            # use_pynnd=True,
            n_cores=10)
    )(
        task_names=["task0"],
        contrib_scores=contrib_scores,
        hypothetical_contribs=hypothetical_contribs_scores,
        one_hot=onehot_data,
        null_per_pos_scores=null_per_pos_scores)

look forward to your response.

AvantiShri commented 1 month ago

Hi Jie,

Thanks for bringing this to my attention. I have left the field but will try to make some time to look into this. Unfortunately it would be hard for me to debug this error without access to the input data - any chance you can provide the input data?

By the way, do you still get the error with tfmodisco-lite (linked from the README page)? That version of tfmodisco is more actively maintained. If you get the error with tfmodisco lite as well, I will prioritize looking into it. In the worst case we can bypass the metaclustering altogether since it is legacy functionality from when tfmodisco was being run on data from multiple tasks.

On Fri, 19 Jul, 2024, 14:03 Jie Li, @.***> wrote:

Hi, i am trying to use tf-modisco to find motifs, but I encountered the following error during execution:

MEMORY 3.652018176 On task task0 Computing windowed sums on original Generating null dist peak(mu)= 0.0071754540205001835 Computing threshold Subsampling! For increasing = True , the minimum IR precision was 0.40944084378429907 occurring at 0.0 implying a frac_neg of 0.6933104659793835 To be conservative, adjusted frac neg is 0.95 Thresholds from null dist were -inf and 8.0625 with frac passing 2e-06 Passing windows frac was 2e-06 , which is below 0.03 ; adjusting Final raw thresholds are -3.90625 and 3.90625 Final transformed thresholds are -0.9697025572005383 and 0.9697025572005383 saving plot to figures/scoredist_0.png Got 9863 coords After resolving overlaps, got 9863 seqlets Across all tasks, the weakest transformed threshold used was: 0.9696025572005383 MEMORY 3.787350016 9863 identified in total Traceback (most recent call last): File "C:\Users\11435\Desktop\code\2024-07\2024-07-19\tf-modisco玉米5套数据测试\Code\utils.py", line 160, in run_modisco(onehot_data=dna_arr[:2000], gradient_data=gradient_arr[:2000]) File "C:\Users\11435\Desktop\code\2024-07\2024-07-19\tf-modisco玉米5套数据测试\Code\utils.py", line 112, in run_modisco tfmodisco_results = modisco.tfmodisco_workflow.workflow.TfModiscoWorkflow( File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\tfmodisco_workflow\workflow.py", line 335, in call metaclustering_results = metaclusterer.fit_transform(seqlets) File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 100, in fit_transform self.fit(seqlets) File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 107, in fit self._fit(attribute_vectors) File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 306, in _fit vector_activity_pattern = self.vector_to_pattern(vector) File "C:\Users\11435\miniconda3\envs\tf24\lib\site-packages\modisco\metaclusterers.py", line 149, in vector_to_pattern assert False AssertionError [0.9666605] [0]

I don't know what caused this issue. Here is the code I ran.

contrib_scores = {"task0": onehot_data * gradient_data} hypothetical_contribs_scores = {"task0": gradient_data} onehot_data = onehot_data

null_per_pos_scores = modisco.coordproducers.LaplaceNullDist(num_to_samp=1000) tfmodisco_results = modisco.tfmodisco_workflow.workflow.TfModiscoWorkflow(

Slight modifications from the default settings

    sliding_window_size=15,
    flank_size=5,
    target_seqlet_fdr=0.15,
    seqlets_to_patterns_factory=
    modisco.tfmodisco_workflow.seqlets_to_patterns.TfModiscoSeqletsToPatternsFactory(
        # Note: as of version 0.5.6.0, it's possible to use the results of a motif discovery
        # software like MEME to improve the TF-MoDISco clustering. To use the meme-based
        # initialization, you would specify the initclusterer_factory as shown in the
        # commented-out code below:
        # initclusterer_factory=modisco.clusterinit.memeinit.MemeInitClustererFactory(
        #    meme_command="meme", base_outdir="meme_out",
        #    max_num_seqlets_to_use=10000, nmotifs=10, n_jobs=1),
        trim_to_window_size=15,
        initial_flank_to_add=5,
        final_flank_to_add=5,
        final_min_cluster_size=20,
        # use_pynnd=True can be used for faster nn comp at coarse grained step
        # (it will use pynndescent), but note that pynndescent may crash
        # use_pynnd=True,
        n_cores=10)
)(
    task_names=["task0"],
    contrib_scores=contrib_scores,
    hypothetical_contribs=hypothetical_contribs_scores,
    one_hot=onehot_data,
    null_per_pos_scores=null_per_pos_scores)

look forward to your response.

— Reply to this email directly, view it on GitHub https://github.com/kundajelab/tfmodisco/issues/117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARSFBQDSPWAZIII4UDSCMLZNDFPLAVCNFSM6AAAAABLEET5POVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYTQMZWHAZTCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>