schneebergerlab / syri

Synteny and Rearrangement Identifier
https://schneebergerlab.github.io/syri/
MIT License
305 stars 36 forks source link

syri process stopped #199

Closed mrwangyz closed 1 year ago

mrwangyz commented 1 year ago

Thank you for developing such a great tool, it really helped me a lot.

I am using syri to process the genome of maize, but I have a problem, most of my genome is getting the results normally, but the rest of the part can not get the result.

I watched the processes and found that they only take up system resources at the beginning and after a while they all hang.After this, the log file is no longer written, and it doesn't take resources anymore, and it doesn't output results, I'm confused about this, I want to know why this is.

The following is the log file: 2023-05-29 09:49:33,801 - numexpr.utils - INFO - _init_num_threads:145 - Note: detected 112 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. 2023-05-29 09:49:33,801 - numexpr.utils - INFO - _init_num_threads:148 - Note: NumExpr detected 112 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-05-29 09:49:33,801 - numexpr.utils - INFO - _init_num_threads:160 - NumExpr defaulting to 8 threads. 2023-05-29 09:49:34,023 - Reading Coords - INFO - syri:134 - Reading input from .tsv file 2023-05-29 09:49:34,450 - Reading Coords - INFO - syri:134 - Filtering alignments 2023-05-29 09:50:12,758 - syri - INFO - syri:213 - starting 2023-05-29 09:50:12,936 - syri - INFO - syri:213 - Analysing chromosomes: ['chr1', 'chr10', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9'] 2023-05-29 09:50:13,260 - syri.chr1 - INFO - mapstar:48 - chr1 (8777, 11) 2023-05-29 09:50:13,261 - syri.chr1 - INFO - mapstar:48 - Identifying Synteny for chromosome chr1 2023-05-29 09:50:13,366 - syri.chr10 - INFO - mapstar:48 - chr10 (4076, 11) 2023-05-29 09:50:13,367 - syri.chr10 - INFO - mapstar:48 - Identifying Synteny for chromosome chr10 2023-05-29 09:50:13,470 - syri.chr2 - INFO - mapstar:48 - chr2 (6818, 11) 2023-05-29 09:50:13,470 - syri.chr2 - INFO - mapstar:48 - Identifying Synteny for chromosome chr2 2023-05-29 09:50:13,575 - syri.chr3 - INFO - mapstar:48 - chr3 (6274, 11) 2023-05-29 09:50:13,575 - syri.chr3 - INFO - mapstar:48 - Identifying Synteny for chromosome chr3 2023-05-29 09:50:13,681 - syri.chr4 - INFO - mapstar:48 - chr4 (6291, 11) 2023-05-29 09:50:13,682 - syri.chr4 - INFO - mapstar:48 - Identifying Synteny for chromosome chr4 2023-05-29 09:50:13,791 - syri.chr5 - INFO - mapstar:48 - chr5 (5790, 11) 2023-05-29 09:50:13,792 - syri.chr5 - INFO - mapstar:48 - Identifying Synteny for chromosome chr5 2023-05-29 09:50:13,895 - syri.chr6 - INFO - mapstar:48 - chr6 (4944, 11) 2023-05-29 09:50:13,896 - syri.chr6 - INFO - mapstar:48 - Identifying Synteny for chromosome chr6 2023-05-29 09:50:14,005 - syri.chr7 - INFO - mapstar:48 - chr7 (5039, 11) 2023-05-29 09:50:14,006 - syri.chr7 - INFO - mapstar:48 - Identifying Synteny for chromosome chr7 2023-05-29 09:50:14,107 - syri.chr8 - INFO - mapstar:48 - chr8 (4875, 11) 2023-05-29 09:50:14,107 - syri.chr8 - INFO - mapstar:48 - Identifying Synteny for chromosome chr8 2023-05-29 09:50:14,219 - syri.chr9 - INFO - mapstar:48 - chr9 (4438, 11) 2023-05-29 09:50:14,219 - syri.chr9 - INFO - mapstar:48 - Identifying Synteny for chromosome chr9 2023-05-29 09:50:16,852 - syri.chr10 - INFO - mapstar:48 - Identifying Inversions for chromosome chr10 2023-05-29 09:50:18,924 - syri.chr8 - INFO - mapstar:48 - Identifying Inversions for chromosome chr8 2023-05-29 09:50:19,043 - syri.chr9 - INFO - mapstar:48 - Identifying Inversions for chromosome chr9 2023-05-29 09:50:19,344 - syri.chr6 - INFO - mapstar:48 - Identifying Inversions for chromosome chr6 2023-05-29 09:50:20,330 - syri.chr7 - INFO - mapstar:48 - Identifying Inversions for chromosome chr7 2023-05-29 09:50:20,574 - syri.chr5 - INFO - mapstar:48 - Identifying Inversions for chromosome chr5 2023-05-29 09:50:20,678 - syri.chr10 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr10 2023-05-29 09:50:21,531 - syri.chr3 - INFO - mapstar:48 - Identifying Inversions for chromosome chr3 2023-05-29 09:50:22,394 - syri.chr4 - INFO - mapstar:48 - Identifying Inversions for chromosome chr4 2023-05-29 09:50:23,376 - syri.chr2 - INFO - mapstar:48 - Identifying Inversions for chromosome chr2 2023-05-29 09:50:24,537 - syri.chr9 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr9 2023-05-29 09:50:26,905 - syri.chr8 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr8 2023-05-29 09:50:30,185 - syri.chr1 - INFO - mapstar:48 - Identifying Inversions for chromosome chr1 2023-05-29 09:50:31,872 - syri.chr6 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr6 2023-05-29 09:50:33,870 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 0.14508938789367676. iterations remaining 16 2023-05-29 09:50:35,365 - syri.chr5 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr5 2023-05-29 09:50:37,157 - syri.chr3 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr3 2023-05-29 09:50:38,453 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 3.62396240234375e-05. iterations remaining 37 2023-05-29 09:50:39,506 - syri.chr4 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr4 2023-05-29 09:50:45,625 - syri.chr2 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr2 2023-05-29 09:50:51,240 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 9.5367431640625e-06. iterations remaining 41 2023-05-29 09:50:51,892 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 9.965896606445312e-05. iterations remaining 35 2023-05-29 09:51:05,386 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 7.995157718658447. iterations remaining 5 2023-05-29 09:51:05,564 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 1.0728836059570312e-05. iterations remaining 44 2023-05-29 09:51:05,662 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 0.00023698806762695312. iterations remaining 32 2023-05-29 09:51:05,777 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 0.004466533660888672. iterations remaining 24 2023-05-29 09:51:12,390 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 2.1457672119140625e-06. iterations remaining 46 2023-05-29 09:51:24,916 - syri.chr1 - INFO - mapstar:48 - Identifying translocation and duplication for chromosome chr1 2023-05-29 09:51:38,319 - Brute-force TD identification - INFO - mapstar:48 - Cluster is too big for Brute Force, using randomized-greedy approach Time taken for last iteration 0.0001323223114013672. iterations remaining 36

mnshgl0110 commented 1 year ago

Hi @mrwangyz. Maize is a complicated genome to analyse and can indeed result in long CPU runtime (check README).

For how long is your job running? Did it actually finish? When you say that "it doesn't take resources anymore", did you also check the CPU usage?

It is possible that in the genomes, there are multiple overlapping TE alignments. Syri tries to find the optimal set of representative alignments and that might be why it is stuck. But, I would guess, that if the CPU usage is not 0, then it is still working and would output at some point.

mrwangyz commented 1 year ago

Thank you for your reply!

I need to identify more than 40 SVs of maize, and I have completed more than 30 so far.

These completed identifications only need to be used for about 12 hours, and I can clearly see the CPU usage and the increase of "TIME+" during the running process, but the rest of the data cannot. It has been running for at least 24 hours.

However, these data are obtained by nucmer comparison with the same script. This is very strange and I plan to try and repeat it with other servers. I'll let you know the follow-up results.

mrwangyz commented 1 year ago

Thank you for your pointers, I switched to another server to run it, and the miracle is that it runs normally and outputs the results.

But during the installation process, I found a problem, that is, I can't install syri in conda, which seems to be caused by the python module. Personally, I tried multiple servers and couldn't install it correctly.

Fortunately, I found a way to install: 1,Create a conda environment specifying the python version, such as 3.8.16. 2,Use pip to install dependent modules (conda installation may fail). 3,Install syri via setup.py in the source code. This is all right, you need to enter the environment at runtime and then specify the path of syri.(syri needs mummer, remember to add PATH, note: I use mummer to compare the genome).

Hope that helps.

mnshgl0110 commented 1 year ago

Hi @mrwangyz . Good that you could made it work. Indeed, sometimes the conda environments become inconsistent resulting in conflicts. Hopefully, anaconda would figure out some strategy to mitigate this issue.

But, I think this issue is solved now so I will close it now.