Closed EddieLv closed 2 years ago
Thank you for your interest in Higashi! When using it through the CLI mode, did it just hang like this (stuck at 0% without any error), or it would quit with an error information? If it's the former one, could you help to attach the log when you kill the process (ctrl+c), such that I can try to figure which process is hanging? Thanks!
This is the error, or do you need the complete log file?
Hi, ruochi. I wonder if the bug is related with the pytorch version? Or I actually did not install higashi through git successfully?
I don't think it has to do with torch version as 1.11.0 is sth I have tested on. The deadlock seems to be triggered by the multiprocessing part. I will run some test on my end. Meanwhile could you share the config.JSON file you created for this run? Thx.
{ "config_name": "Cere-24-20220416", "data_dir": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/7_higashi_input", "input_format": "higashi_v1", "structured": "true", "temp_dir": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/8_higashi_out", "genome_reference_path": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/GRCm39.chr.sizes.txt", "cytoband_path": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/GRCm39_cytoband.txt", "chrom_list": ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"], "resolution": 1000000, "resolution_cell": 1000000, "local_transfer_range": 1, "dimensions": 64, "loss_mode": "zinb", "rank_thres": 1, "embedding_epoch": 80, "no_nbr_epoch": 80, "with_nbr_epoch": 60, "embedding_name": "Cere-24-20220416_zinb", "impute_list": ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"], "minimum_distance": 1000000, "maximum_distance": -1, "neighbor_num": 5, "cpu_num": -1, "gpu_num": 0, "UMAP_params": {"n_neighbors": 20} }
And my python version is 3.9.0. :)
Hi, I just updated the code base (specifically the main_cell.py file). Could you try to set the cpu_num as 1, run Higashi with the CLI approach (python higashi/main_cell.py -c ../...JSON -s 2)? The -s 2 will make sure the program starts at the training for imputation step. Setting cpu_num = 1 in the JSON file will disable the multiprocessing. Let's see if there will be any error without using multiprocessing. If it hangs again, interrupt it and attach the logs please. Thx.
It seems work, ruochi.
0%| | 0/24 [00:00<?, ?it/s] 100%|██████████| 24/24 [00:00<00:00, 412554.49it/s]
0%| | 0/24 [00:00<?, ?it/s] 100%|██████████| 24/24 [00:00<00:00, 521571.48it/s]
0%| | 0/24 [00:00<?, ?it/s] 25%|██▌ | 6/24 [00:00<00:00, 57.71it/s] 100%|██████████| 24/24 [00:00<00:00, 123.71it/s]
0%| | 0/24 [00:00<?, ?it/s] 100%|██████████| 24/24 [00:00<00:00, 759.27it/s]
But what if I wanna use multi cpu?
And I test it with cpu:-1, the same error occurs.
That's... unexpected... the cpu=1 is just used to debug... I thought the error would persist. It's just easier to debug without multiprocessing. What if you do cpu:2 or cpu:3? Would that trigger the error?
Yeap... I tried cpu=2,8, and that trigger the same error, but cpu=1 can work.
Let me try to run the code on my cpu server and get back to you. If cpu=1 can work then it has nothing to do with the data itself. I have sth that I suspect might be the reason though. Will get back with more details.
I found that it actually created multi process, but the process seemed sleeping.
Hi, ruochi. How is the question solved?
Sorry for the late reply. I was on a trip. I tested it on the cpu machine I have (linux). The multiprocessing seems to be working fine. I am planning to test it on a windows PC. The configuration of the environment takes a while as I never used that PC to run python program before. I will post an update later.
Hi,ruochi. My computer is linux as well, I wonder if I did not install higashi successfully actually? Recently I met with some problems more, 1.when I set cpu=1 and run CLI, the .err file is 0%| | 0/19 [00:00<?, ?it/s] 100%|██████████| 19/19 [00:00<00:00, 520861.28it/s]
0%| | 0/19 [00:00<?, ?it/s]
100%|██████████| 19/19 [00:00<00:00, 664098.13it/s]
Traceback (most recent call last):
File "main_cell.py", line 1328, in
These two are triggered by different reasons. For the first one, it's caused by that there is not stage 1 model trained for that JSON. If you didn't trained the model before when using CLI mode, you should do python main_cell.py -c xxx -s 1 instead of -s 2
For the second one, the error is triggered by that the cytoband file you provided contains str in the "start" column. Could you attach your cytoband file here for reference? I can push a fix soon to make the code more compatible when encountering str in the "start" column, but it would be helpful to see why would there be a str.
OK, here is my cytoband file. GRCm39_cytoband.txt
Ah. I know, it's because the first line #chrom, chromStart, chromEnd are interpreted as the content not the header. Delete the first line, the code should be fine. The cytoband file I downloaded from UCSD doesn't contain header and that's why I thought it wouldn't have the header by default. I can add some code to make sure the program ignore line that start with #.
OK~thanks
I just added some code to support a new parameter in the JSON file. If you set "cpu_num_torch": -1, but "cpu_num":1. The code should still utilizes multiprocessing for pytorch training, but only one cpu process for generating training batches. This is a temporary solution, and is not as optimized as the original version. But since I cannot replicate the error on my end. I would have to guess what triggers the error, which could take a while.
I will close this issue for now. But if I have more updates, I will posted it here.
Hi, ruochi! My higashi work smoothly until it comes with train for imputation, and there is no error or warning. And here is my higashi experiment: Package Version
asciitree 0.3.3 asttokens 2.0.5 attrs 21.4.0 backcall 0.2.0 bleach 5.0.0 bokeh 3.0.0.dev5 brotlipy 0.7.0 certifi 2021.10.8 cffi 1.15.0 charset-normalizer 2.0.4 click 8.1.2 cooler 0.8.11 cryptography 36.0.0 cycler 0.11.0 Cython 3.0.0a10 cytoolz 0.10.1 debugpy 1.5.1 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.4 entrypoints 0.4 executing 0.8.3 fastjsonschema 2.15.3 fbpca 1.0 fonttools 4.33.2 h5py 3.6.0 higashi 0.1.0a0 idna 3.3 importlib-metadata 4.11.3 importlib-resources 5.7.1 ipykernel 6.9.1 ipython 8.2.0 ipython-genutils 0.2.0 ipywidgets 7.7.0 jedi 0.18.1 Jinja2 3.1.1 joblib 1.1.0 jsonschema 4.4.0 jupyter-client 7.2.2 jupyter-core 4.9.2 jupyterlab-widgets 1.1.0 kiwisolver 1.4.2 llvmlite 0.38.0 MarkupSafe 2.0.1 matplotlib 3.5.1 matplotlib-inline 0.1.2 mistune 0.8.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 multiprocess 0.70.12.2 nbconvert 5.6.1 nbformat 5.3.0 nest-asyncio 1.5.5 notebook 5.7.11 numba 0.55.1 numpy 1.21.5 packaging 21.3 pandas 1.3.4 pandocfilters 1.5.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.1 pip 21.2.4 prometheus-client 0.14.1 prompt-toolkit 3.0.20 ptyprocess 0.7.0 pure-eval 0.2.2 pycparser 2.21 pyfaidx 0.6.4 Pygments 2.11.2 pynndescent 0.5.6 pyOpenSSL 22.0.0 pypairix 0.3.7 pyparsing 3.0.8 pyrsistent 0.18.0 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2022.1 PyYAML 6.0 pyzmq 22.3.0 requests 2.27.1 scikit-learn 1.0.2 scipy 1.7.3 seaborn 0.11.2 Send2Trash 1.8.0 setuptools 61.2.0 simplejson 3.17.6 six 1.16.0 stack-data 0.2.0 terminado 0.13.3 testpath 0.6.0 threadpoolctl 3.1.0 toolz 0.11.2 torch 1.11.0 torchaudio 0.11.0 torchvision 0.12.0 tornado 6.1 tqdm 4.64.0 traitlets 5.1.1 typing_extensions 4.1.1 umap-learn 0.5.3 urllib3 1.26.9 wcwidth 0.2.5 webencodings 0.5.1 wheel 0.37.1 widgetsnbextension 3.6.0 xyzservices 2022.4.0 zipp 3.8.0