kfuku52 / csubst

Molecular convergence detection
BSD 3-Clause "New" or "Revised" License
25 stars 1 forks source link

Error during instantaneous rate matrix generation #38

Closed agneeshbarua closed 1 year ago

agneeshbarua commented 1 year ago

Dear Fukushima-sensei,

Thank you so much for writing the CSUBST program and providing such detailed documentation as a part of the paper. I have been trying to run CSUBST on my own data and have encountered an issue.

I initially tested the program on an orthogroup dataset with one gene per species. It worked fine and produced the output. But when I tried it with larger orthogroups (multiple paralogs per species), I am getting the attached error. There seems to be some error with the state matrix during the Instantaneous substitution rate matrix generation step. It's strange because the matrix does get produced, but the program crashes right afterwards.

I have attached the slurm error file, the conda environment list, and slurm script (test with single array) I'm using.

Hoping you can provide some advice.

Thank you.

array_csubst.txt conda_env.txt error_slurm_out.txt

kfuku52 commented 1 year ago

Thank you for using CSUBST! I am not 100% sure but the error implies that simultaneous CSUBST runs or remnant files from previous runs are interfering the new analysis. If the problem persists in a freshly created directory, could you provide a set of CSUBST input files so I can check them on my end?

Traceback (most recent call last):
  File "/users/abarua/myproject_envs/Pigmentation_env/bin/csubst", line 324, in <module>
    args.handler(args)
  File "/users/abarua/myproject_envs/Pigmentation_env/bin/csubst", line 43, in command_analyze
    main_analyze(g)
  File "/users/abarua/myproject_envs/Pigmentation_env/lib/python3.11/site-packages/csubst/main_analyze.py", line 127, in main_analyze
    g = parser_misc.prep_state(g)
        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/abarua/myproject_envs/Pigmentation_env/lib/python3.11/site-packages/csubst/parser_misc.py", line 219, in prep_state
    state_cdn = parser_iqtree.get_state_tensor(g)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/users/abarua/myproject_envs/Pigmentation_env/lib/python3.11/site-packages/csubst/parser_iqtree.py", line 190, in get_state_tensor
    state_tensor[node.numerical_label,:,:] = state_matrix
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not broadcast input array from shape (1731,61) into shape (577,61)
agneeshbarua commented 1 year ago

Dear sensei,

I went back and performed the analysis in a new fresh directory and still got the same error. I've attached the alignment, gene trees, and foreground file. I didn't include the intermediary iqtree files as they get generated pretty quickly.

The analysis seems to work when there is one gene per species, but with multiple paralogs, it fails.

foreground.txt OG0002198_cds_hammer.txt OG0002198_generax.txt

kfuku52 commented 1 year ago

Internal node and tip labels should be uniquely assigned, but your tree seems to have non-unique internal node names. species_22 appeared three times and potentially other names too. The error should be gone if you could run CSUBST again with unique names or after deleting all internal node names if they are not necessary. NWKIT drop may be useful to remove the labels. https://github.com/kfuku52/nwkit/wiki/nwkit-drop

agneeshbarua commented 1 year ago

Thank you for pointing out the error. It works perfectly now!

kfuku52 commented 1 year ago

I'm glad to hear that it worked :)