logsdon-lab / CenMAP

Centromere mapping and annotation pipeline
MIT License
8 stars 0 forks source link

CenMap running stopped at installing pip dependencies step (ModDotPlot) #74

Open chunlinxiao opened 2 weeks ago

chunlinxiao commented 2 weeks ago

Hi,

I was testing cenmap, but it stopped at below. I also tried to install ModDotPlot myself, and it seemed to fail here. Taking suggestion from others to install PIL by using "pip install Pillow", I verified that PIL was installed lib/python3.11/site-packages/PIL, but re-run cen-map or re-installing ModDotPlot still failed at the same step. Do you have any suggestion? thanks

Installing pip dependencies: ...working... Pip subprocess error: Running command git clone --filter=blob:none --quiet https://github.com/marbl/ModDotPlot.git /tmp/pip-req-build-a2z6p7l1 ERROR: Ignored the following versions that require a different python version: 1.10.0 Requires-Python <3.12,>=3.8; 1.10.0rc1 Requires-Python <3.12,>=3.8; 1.10.0rc2 Requires-Python <3.12,>=3.8; 1.10.1 Requires-Python <3.12,>=3.8; 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 1.7.2 Requires-Python >=3.7,<3.11; 1.7.3 Requires-Python >=3.7,<3.11; 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11; 1.9.0 Requires-Python >=3.8,<3.12; 1.9.0rc1 Requires-Python >=3.8,<3.12; 1.9.0rc2 Requires-Python >=3.8,<3.12; 1.9.0rc3 Requires-Python >=3.8,<3.12; 1.9.1 Requires-Python >=3.8,<3.12 ERROR: Could not find a version that satisfies the requirement PIL (from moddotplot) (from versions: none) ERROR: No matching distribution found for PIL

failed

CondaEnvException: Pip failed

koisland commented 2 weeks ago

Hi chunlinxiao,

I believe PR #73 fixes this issue. Can you try git pull to get the most recent pipeline version and try rerunning?

chunlinxiao commented 1 week ago

Thanks @koisland - I did get CenMap running after git pull. However, the process stopped with the following runtime error. Appreciate if you can provide some suggestion.

Activating conda environment: .snakemake/conda/31344745a58e016f0f90cf008b6424c4_ [Sat Sep 14 11:25:39 2024] Error in rule check_asm_nucflag: jobid: 222 input: results/nucflag/HG002_hifi.bam, results/cens_new/bed/HG002_ALR_regions.bed, config/nucflag.toml, results/nucflag/HG002_correct_ALR_regions.rm.simple.bed, results/nucflag/HG002_correct_ALR_regions.rm.bed output: results/nucflag/HG002, results/nucflag/HG002_cen_misassemblies.bed, results/nucflag/HG002_cen_status.bed log: logs/nucflag/run_nucflagHG002.log (check log file(s) for error details) conda-env: CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4 shell:

    nucflag         -i results/nucflag/HG002_hifi.bam         -d results/nucflag/HG002         -o results/nucflag/HG002_cen_misassemblies.bed         -t 8         -p 8         -s results/nucflag/HG002_cen_status.bed         -c config/nucflag.toml         -b results/cens_new/bed/HG002_ALR_regions.bed         --ignore_regions results/nucflag/HG002_correct_ALR_regions.rm.simple.bed         --overlay_regions results/nucflag/HG002_correct_ALR_regions.rm.bed          &> logs/nucflag/run_nucflag_HG002.log

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Logfile logs/nucflag/run_nucflag_HG002.log:

Traceback (most recent call last): File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4_/lib/python3.12/multiprocessing/pool.py", line 215, in init self._repopulatepool() File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4/lib/python3.12/multiprocessing/pool.py", line 306, in _repopulate_pool return self._repopulate_pool_static(self.ctx, self.Process, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4/lib/python3.12/multiprocessing/pool.py", line 329, in _repopulate_poolstatic w.start() File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4/lib/python3.12/multiprocessing/dummy/init.py", line 51, in start threading.Thread.start(self) File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4_/lib/python3.12/threading.py", line 971, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread

File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4/lib/python3.12/multiprocessing/dummy/init.py", line 51, in start threading.Thread.start(self) File "CenMAP/.snakemake/conda/31344745a58e016f0f90cf008b6424c4/lib/python3.12/threading.py", line 971, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread

koisland commented 1 week ago

Hi chunlinxiao,

This is OS related. nucflag can't start a new thread because all threads are in use.

I suggest rerunning with fewer processes and threads. You can do that by reducing these values. Let me know if that works for you.

nucflag:
  threads_aln: 4
  processes_nucflag: 4
chunlinxiao commented 1 week ago

Hi @koisland thanks for the suggestion for reducing num of threads, indeed, it helped get the cenmap going until the following error - appreciate if you could provide some suggestion. Many thanks.

Activating conda environment: .snakemake/conda/d929071de8a7c5c417717672309902e4_ Touching output file results/humas_hmmer/split_cens_for_humas_hmmer_chr21.done. [Wed Sep 18 14:47:46 2024] Finished job 365. 321 of 562 steps (57%) done MissingInputException in rule get_live_hor in file CenMAP/workflow/rules/plot_hor_stv.smk, line 5: Missing input files for rule get_live_hor: output: results/hor_stv/bed/chr21/HG002_chr21_chr21_MATERNAL:12248025-14043023_renamed.bed, results/hor_stv/bed/chr21/HG002_chr21_chr21_MATERNAL:12248025-14043023_liveHORs.bed wildcards: chr=chr21, fname=HG002_chr21_chr21_MATERNAL:12248025-14043023 affected files: workflow/scripts/stv_fix/scripts/live_HORs_filter.py

koisland commented 1 week ago

Ah okay. This is an issue with the initial cloning of the repo. In the future, the full command should be.

git clone git@github.com:logsdon-lab/CenMAP.git --recurse-submodules

Without that flag, it's missing workflow/scripts/stv_fix/scripts/live_HORs_filter.py, a file from a submodule.

You should be able to fix this by running.

make update_submodules

Let me know if that works, Keith

chunlinxiao commented 1 week ago

I remembered there was a problem when using "--recurse-submodules", thus I cloned without it.

Now following your "make update_submodules", still have issue as below:

Please make sure you have the correct access rights and the repository exists. fatal: clone of 'git@github.com:koisland/stv.git' into submodule path 'CenMAP/workflow/scripts/stv_fix' failed Failed to clone 'workflow/scripts/stv_fix' a second time, aborting make: *** [Makefile:15: update_submodules] Error 1

koisland commented 1 week ago

Hi @chunlinxiao,

Can you run git pull && make submodules command again? With PR #76, I updated the submodule to use https instead of ssh, which should hopefully fix this issue.

Thanks, Keith

chunlinxiao commented 1 week ago

after "run git pull && make submodules command" - I got:

koisland commented 1 week ago

Sorry, my bad. The command is make update_submodules.

chunlinxiao commented 1 week ago

still error:

CenMAP>git pull && make update_submodules

Please make sure you have the correct access rights and the repository exists. fatal: clone of 'git@github.com:koisland/stv.git' into submodule path 'CenMAP/workflow/scripts/stv_fix' failed Failed to clone 'workflow/scripts/stv_fix' a second time, aborting make: *** [Makefile:15: update_submodules] Error 1

koisland commented 1 week ago

Hi @chunlinxiao,

It looks like the submodule URL wasn't updated. Try git submodule sync && make update_submodules. This will update git's pointer to the new URL in your local repo and then update it. Hopefully, that should work.

Thanks, Keith

chunlinxiao commented 1 week ago

its running again after "git submodule sync && make update_submodules" - hope it goes to the end this time :-)

Many thanks @koisland

koisland commented 1 week ago

Awesome! Thanks for bearing with the debugging.

chunlinxiao commented 6 days ago

the testing process seemed hanging here:

Activating conda environment: .snakemake/conda/d929071de8a7c5c417717672309902e4_ Touching output file results/humas_hmmer/split_cens_for_humas_hmmer_chr16.done. [Fri Sep 20 11:49:39 2024] Finished job 350. 8 of 285 steps (3%) done MissingInputException in rule aggregate_format_all_stv_row in file CenMAP/workflow/rules/plot_hor_stv.smk, line 80: Missing input files for rule aggregate_format_all_stv_row: output: results/hor_stv/bed/chr16_AS-HOR_stv_row.all.bed wildcards: chr=chr16 affected files: results/hor_stv/bed/chr16/HG002_chr16_chr16_MATERNAL:35413923-38574407_stv_row.bed results/hor_stv/bed/chr16/AS-HOR_HG002_chr16_chr16_PATERNAL:34212836-37771735_stv_row.bed results/hor_stv/bed/chr16/HG002_chr16_chr16_PATERNAL:34212836-37771735_stv_row.bed results/hor_stv/bed/chr16/AS-HOR_HG002_chr16_chr16_MATERNAL:35413923-38574407_stv_row.bed

I thus stopped it. But when I re-run it, I got the following error:

RuntimeError: can't create new thread at interpreter shutdown Exception in thread Thread-32: Traceback (most recent call last): File "miniconda3/envs/cenmap/lib/python3.12/threading.py", line 1052, in _bootstrap_inner self.run() File "miniconda3/envs/cenmap/lib/python3.12/site-packages/snakemake/benchmark.py", line 242, in run self.function(*self.args, **self.kwargs) File "miniconda3/envs/cenmap/lib/python3.12/site-packages/snakemake/benchmark.py", line 278, in _action self._timer.start() File "miniconda3/envs/cenmap/lib/python3.12/threading.py", line 971, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't create new thread at interpreter shutdown

koisland commented 6 days ago

Hi @chunlinxiao,

I haven't encountered this error before so I'll need to run CenMAP on HG002 and see if I can reproduce it. I'll get back to you on this.

Thanks, Keith

chunlinxiao commented 4 days ago

After reinstalling CenMap, re-run my test from beginning and had the following error (half way !) - see if this helps:

Activating conda environment: .snakemake/conda/d929071de8a7c5c417717672309902e4_ [Wed Sep 25 06:04:31 2024] Finished job 339. 358 of 630 steps (57%) done Touching output file results/humas_hmmer/split_cens_for_humas_hmmer_chr8.done. [Wed Sep 25 06:04:31 2024] Finished job 326. 359 of 630 steps (57%) done MissingInputException in rule aggregate_format_all_stv_row in file CenMAP/workflow/rules/plot_hor_stv.smk, line 80: Missing input files for rule aggregate_format_all_stv_row: output: results/hor_stv/bed/chr8_AS-HOR_stv_row.all.bed wildcards: chr=chr8 affected files: results/hor_stv/bed/chr8/AS-HOR_HG002_chr8_chr8_MATERNAL:43008335-47438217_stv_row.bed results/hor_stv/bed/chr8/HG002_chr8_chr8_MATERNAL:43008335-47438217_stv_row.bed

===================

Also, installation with "--recurse-submodules" still had error as previous reporting, I git cloned CenMap, then using "git submodule sync && make update_submodules".

koisland commented 3 days ago

This is helpful. This is an issue with Snakemake checkpoints. I'm looking into this.

I'm not sure about the second issue.

koisland commented 3 days ago

Hey @chunlinxiao,

I fixed the issue in PR #77. I recommend saving your config/config.yaml to another location before you git pull. Then, when you rerun, use --rerun-triggers mtime in the Snakemake command to avoid rerunning unnecessary steps. Let me know if it works for you.

Thanks, Keith