Closed shivang-22 closed 8 months ago
Hi @shivang-22, could you give us any more info? When you say it "gets stuck" what was the last output? featurize
is directly calling matminer
's featurize_many
under the hood by default, which has been known to be a bit iffy with parallelism (though I'm not sure why it would work the first time on the same data). You could try explicitly setting the number of "jobs" in the featurizer with e.g. data.featurize(n_jobs=1)
.
Certainly! So this is the error log I get when I interrupt the kernel because it got 'stuck'. I'm not pasting the message in its entirety because that would be too long, but this might help maybe.
The top of the error log is:
Cell In[15], line 18, in GNN(df, target, extra_feat, batch, lr)
10 mod_df.reset_index(inplace=True, drop=True)
12 data = MODData(
13 materials=mod_df["Name"],
14 targets=mod_df[target],
15 target_names=[target]
16 )
---> 18 data.featurize()
20 for f in range(len(extra_feat)):
21 data.df_featurized[extra_feat[f]] = mod_df[extra_feat[f]].values
File /scratch/micromamba/envs/alembic/lib/python3.10/site-packages/modnet/preprocessing.py:783, in MODData.featurize(self, fast, db_file, n_jobs, drop_allnan)
779 df_final = df_done
781 # otherwise, no structures were loaded, so we need to compute all
782 else:
--> 783 df_final = self.featurizer.featurize(self.df_structure)
785 # replace infinite values by nan that are handled during the fit
786 df_final = clean_df(df_final, drop_allnan=drop_allnan)
File /scratch/micromamba/envs/alembic/lib/python3.10/site-packages/modnet/featurizers/featurizers.py:91, in MODFeaturizer.featurize(self, df)
89 df_composition = pd.DataFrame([])
90 if self.composition_featurizers or self.oxid_composition_featurizers:
---> 91 df_composition = self.featurize_composition(df)
93 df_structure = pd.DataFrame([])
94 if self.structure_featurizers:
This points to the fact that its still computing the features. The bottom of the error log was more interesting to me, and reads as follows:
File /scratch/micromamba/envs/alembic/lib/python3.10/site-packages/matminer/featurizers/base.py:476, in BaseFeaturizer.featurize_many(self, entries, ignore_errors, return_errors, pbar)
470 with Pool(self.n_jobs, maxtasksperchild=1) as p:
471 func = partial(
472 self.featurize_wrapper,
473 return_errors=return_errors,
474 ignore_errors=ignore_errors,
475 )
--> 476 res = p.map(func, entries, chunksize=self.chunksize)
477 return res
File /scratch/micromamba/envs/alembic/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
362 def map(self, func, iterable, chunksize=None):
363 '''
364 Apply `func` to each element in `iterable`, collecting the results
365 in a list that is returned.
366 '''
--> 367 return self._map_async(func, iterable, mapstar, chunksize).get()
File /scratch/micromamba/envs/alembic/lib/python3.10/multiprocessing/pool.py:768, in ApplyResult.get(self, timeout)
767 def get(self, timeout=None):
--> 768 self.wait(timeout)
769 if not self.ready():
770 raise TimeoutError
File /scratch/micromamba/envs/alembic/lib/python3.10/multiprocessing/pool.py:765, in ApplyResult.wait(self, timeout)
764 def wait(self, timeout=None):
--> 765 self._event.wait(timeout)
File /scratch/micromamba/envs/alembic/lib/python3.10/threading.py:607, in Event.wait(self, timeout)
605 signaled = self._flag
606 if not signaled:
--> 607 signaled = self._cond.wait(timeout)
608 return signaled
File /scratch/micromamba/envs/alembic/lib/python3.10/threading.py:320, in Condition.wait(self, timeout)
318 try: # restore state no matter what (e.g., KeyboardInterrupt)
319 if timeout is None:
--> 320 waiter.acquire()
321 gotit = True
322 else:
KeyboardInterrupt:
Its seems to me that the code is waiting indefinitely?
So it gets stuck in the parallel internals of matminer (maybe -- depends on your luck when you actually interrupt). I would rerun with n_jobs=1
as suggested above and see if you get the same problem. Otherwise you can also try changing the featurizer mode between multi
and single
which will change the parallelism to be over structures rather than features.
e.g. add to the snippet above:
data.featurizer.featurizer_mode = "single"
This will either "just work" or it will give us better debug info on which featurizer is causing it to hang.
Okay, so both n_jobs=1
and data.featurizer.featurizer_mode = "single"
work, but the speed is significantly slower than the default. The latter still (understandably) does better, but is there a way to make this method faster?
The speed is just a limitation of matminer unfortunately. Glad it is working now though. You can see https://github.com/hackingmaterials/matminer/issues/902 for the full description of the problem of parallelism in matminer.
I am using the following function to use MODNet on a custom dataset with compositions only:
It runs fine the first time I use it, but if I change the inputs to the function and run it again in another cell, it gets stuck forever on the featurize step. So,
GNN(data_df, 'y', ['x1', 'x2'], 32, 0.02)
works fine, but then in the very next cell,GNN(data_df, 'y', ['x1', 'x2'], 32, 0.04)
get stuck. Am I missing something?