Open nickdeveaux opened 6 years ago
Merging #63 into master will decrease coverage by
0.09%
. The diff coverage is0%
.
@@ Coverage Diff @@
## master #63 +/- ##
=========================================
- Coverage 70.54% 70.44% -0.1%
=========================================
Files 18 18
Lines 1480 1482 +2
=========================================
Hits 1044 1044
- Misses 436 438 +2
Impacted Files | Coverage Δ | |
---|---|---|
inferelator_ng/bbsr_tfa_workflow.py | 0% <0%> (ø) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3876f18...b4ea275. Read the comment docs.
👍
I tried this on NYU HPC. I submitted the following code into the system using sbatch
:
#!/bin/sh
#SBATCH --nodes=3
#SBATCH --tasks-per-node=4
#SBATCH --mem=10GB
#SBATCH --time=2:00:00
#SBATCH --job-name=Infer_Test
#SBATCH --output=Infer_Test_KVS_10GB_3_nodes_4_tasks_pull63.out
module purge
module load r/intel/3.4.2 python/intel/2.7.12 bedtools/intel/2.26.0
source /home/kmt331/inferelator_ng/py2.7/bin/activate
cd /home/kmt331/inferelator_ng
export PYTHONPATH=$PYTHONPATH:$(pwd)/kvsstcp
time python ~/inferelator_ng/kvsstcp/kvsstcp.py --execcmd 'srun -n '${SLURM_NTASKS}' python bsubtilis_bbsr_workflow_runner.py'
When I ran this code using the original code on the master branch, everything ran fine and the results looked fine. But when I switched to the nickdeveaux-ndv_dont_share_mi_clr_but_still_lock_per_bootstrap
branch (with the code in this pull request), I got the following error (not going to paste the entire output here, just the part that looks relevant):
Creating design and response matrix ...
Setting up TFA specific response matrix ...
Computing Transcription Factor Activity ...
Bootstrap 1 of 2
Calculating MI, Background MI, and CLR Matrix
Traceback (most recent call last):
File "bsubtilis_bbsr_workflow_runner.py", line 10, in <module>
workflow.run()
File "/home/kmt331/inferelator_ng/inferelator_ng/bbsr_tfa_workflow.py", line 49, in run
(self.clr_matrix, self.mi_matrix) = self.mi_clr_driver.run(X, Y)
File "/home/kmt331/inferelator_ng/inferelator_ng/mi_R.py", line 83, in run
Creating design and response matrix ...
Setting up TFA specific response matrix ...
Computing Transcription Factor Activity ...
Bootstrap 1 of 2
Calculating MI, Background MI, and CLR Matrix
Creating design and response matrix ...
Setting up TFA specific response matrix ...
Computing Transcription Factor Activity ...
Bootstrap 1 of 2
Calculating MI, Background MI, and CLR Matrix
Traceback (most recent call last):
File "bsubtilis_bbsr_workflow_runner.py", line 10, in <module>
Traceback (most recent call last):
File "bsubtilis_bbsr_workflow_runner.py", line 10, in <module>
workflow.run()
workflow.run()
File "/home/kmt331/inferelator_ng/inferelator_ng/bbsr_tfa_workflow.py", line 49, in run
File "/home/kmt331/inferelator_ng/inferelator_ng/bbsr_tfa_workflow.py", line 49, in run
(self.clr_matrix, self.mi_matrix) = self.mi_clr_driver.run(X, Y)
(self.clr_matrix, self.mi_matrix) = self.mi_clr_driver.run(X, Y)
File "/home/kmt331/inferelator_ng/inferelator_ng/mi_R.py", line 83, in run
File "/home/kmt331/inferelator_ng/inferelator_ng/mi_R.py", line 83, in run
matrix_data_frame = pd.read_csv(matrix_path, sep='\t')
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 645, in parser_f
matrix_data_frame = pd.read_csv(matrix_path, sep='\t')
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 645, in parser_f
matrix_data_frame = pd.read_csv(matrix_path, sep='\t')
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 645, in parser_f
return _read(filepath_or_buffer, kwds)
return _read(filepath_or_buffer, kwds)
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 400, in _read
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 400, in _read
return _read(filepath_or_buffer, kwds)
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 400, in _read
data = parser.read()
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 938, in read
data = parser.read()
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 938, in read
data = parser.read()
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 938, in read
ret = self._engine.read(nrows)
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.
1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1507, in read
ret = self._engine.read(nrows)
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1507, in read
ret = self._engine.read(nrows)
File "/share/apps/python/2.7.12/intel/lib/python2.7/site-packages/pandas-0.19.1-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1507, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 846, in pandas.parser.TextReader.read (pandas/parser.c:9935)
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 846, in pandas.parser.TextReader.read (pandas/parser.c:9935)
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 846, in pandas.parser.TextReader.read (pandas/parser.c:9935)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10193)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10193)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10193)
File "pandas/parser.pyx", line 922, in pandas.parser.TextReader._read_rows (pa
ndas/parser.c:10921)
File "pandas/parser.pyx", line 922, in pandas.parser.TextReader._read_rows (pandas/parser.c:10921)
File "pandas/parser.pyx", line 922, in pandas.parser.TextReader._read_rows (pandas/parser.c:10921)
File "pandas/parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:10792)
File "pandas/parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:10792)
File "pandas/parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:10792)
File "pandas/parser.pyx", line 2018, in pandas.parser.raise_parser_error (pandas/parser.c:25929)
File "pandas/parser.pyx", line 2018, in pandas.parser.raise_parser_error (pandas/parser.c:25929)
2018-05-29 18:44:01,710 INFO kvs : Closing connection from ('172.16.2.127', 55612)
2018-05-29 18:44:01,710 INFO kvs : Closing connection from ('172.16.2.127', 55614)
2018-05-29 18:44:01,710 INFO kvs : Closing connection from ('172.16.2.127', 55610)
File "pandas/parser.pyx", line 2018, in pandas.parser.raise_parser_error (pandas/parser.c:25929)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 240 fields in line 1529, saw 281
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 240 fields in line 1529, saw 281
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 240 fields in line 1529, saw 281
Creating design and response matrix ...
Setting up TFA specific response matrix ...
Computing Transcription Factor Activity ...
Bootstrap 1 of 2
Calculating MI, Background MI, and CLR Matrix
srun: error: c41-06: tasks 5-7: Exited with exit code 1
srun: Terminating job step 6497421.0
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** STEP 6497421.0 ON c41-04 CANCELLED AT 2018-05-29T18:44:01 ***
2018-05-29 18:44:01,881 INFO kvs : Closing connection from ('172.16.2.129', 55022)
2018-05-29 18:44:01,890 INFO kvs : Closing connection from ('172.16.2.127', 55616)
... etc... ...
srun: error: c41-04: tasks 0-3: Killed
srun: error: c41-12: tasks 8-11: Killed
Traceback (most recent call last):
2018-05-29 18:44:02,022 INFO kvs : Server shutting down
File "/home/kmt331/inferelator_ng/kvsstcp/kvsstcp.py", line 605, in <module>
subprocess.check_call(args.execcmd, shell=True, env=t.env())
File "/share/apps/python/2.7.12/intel/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'srun -n 12 python bsubtilis_bbsr_workflow_runner.py' returned non-zero exit status 1
real 0m15.581s
user 0m0.054s
sys 0m0.049s
@nickdeveaux any ideas why i'm getting that error?
Has anybody else tried this? Does it for for anyone else? I am still getting the same error on NYU HPC. This time I was working on the InfereCLaDR branch and I put in the same changes that you did into bbsr_tfa_runner.py manually, and I still got the same error.
@kostyat @dayanne-castro
Calculating Mi and CLR and sending it to workers was sending a large amount of data to each worker per bootstrap. For example, for a 60k gene by 150 sample input file, the mi and clr matrices summed to .6 GB, and ended up being 1.6 GB of data once they were pickled. This was sent to 70 workers across 20 bootstraps on the cluster, leading to a massive (>10x) slowdown.
Now, each worker calculates mi and clr independently, and needs to wait for a new special key (bootstrap %idx) before moving forward