Open janosh opened 4 years ago
Turns out that if I only use an ElementProperty
featurizer (which generates the only features that are retained anyway), the problem disappears.
import automatminer as amm
import matminer as mm
featurizers = {
"composition": [mm.featurizers.composition.ElementProperty.from_preset("magpie")],
"structure": [],
}
pipe_config = {
**amm.get_preset_config(),
"autofeaturizer": amm.AutoFeaturizer(
featurizers=featurizers,
guess_oxistates=False,
),
}
pipe = amm.MatPipe(**pipe_config)
Hey @janosh thanks for the bug report. I've been aware of this problem for some time and am actually currently running some tests to try and pinpoint it.
I actually think this is a bug with matminer and job parallelization with mulitprocessing
. For example, if you try just using StructuretoOxidStructure
etc. from matminer I'd wager you'd see the same issues.
What I think is happening behind the scenes is when n_jobs
is high (relative to the compute ability of whatever machine you are running it on), the expensive chunks are delegated very few compute cycles by the CPU and/or are not allocated sufficient memory. I don't think there is any infinite loop happening (AFAIK) but the CPU is not allowing a highly parallelized process to run efficiently.
Does running the bare featurizers (without automatminer) still have this problem? My guess is yes.
If so, does setting n_jobs for an individual featurizer change the halting behavior whatsoever? My guess is that if you set n_jobs=1 the job will go very slowly but eventually finish, and if you turn n_jobs very high you increase the probability it halts indefinitely.
I've been trying to work around what might be a bug in (auto-)matminer. Trying to make predictions for a large dataframe (around 80000 rows) never finishes. I think the culprit might be guessing oxidation states as that seems to a long time and also increases rapidly in run time from one prediction to the next when slicing up the dataframe into chunks and predicting on each chunk individually.
@ardunn I couldn't create a minimal example with dummy data that reproduces this issue but maybe you can try to run this script and see if you experience the same issue.