sparks-baird / CrabNet

Predict materials properties using only the composition information!
https://crabnet.readthedocs.io/
MIT License
15 stars 5 forks source link

support scientific notation for chemical formulas #43

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago
Generating EDM: 100%|██████████| 10/10 [00:00<00:00, 15313.27formulae/s]
loading data with up to 3 elements in the formula
---------------------------------------------------------------------------
CompositionError                          Traceback (most recent call last)
[<ipython-input-117-e71b71bef36c>](https://localhost:8080/#) in <module>()
     11 
     12 disc.fit(train_df)
---> 13 score = disc.predict(val_df, umap_random_state=42)
     14 
     15 # Interactive scatter plot colored by clusters

8 frames
[/usr/local/lib/python3.7/dist-packages/mat_discover/mat_discover_.py](https://localhost:8080/#) in predict(self, val_df, plotting, umap_random_state, pred_weight, proxy_weight, dummy_run, count_repeats, return_peak)
    522             )
    523             self.val_pred, self.val_sigma, self.val_true = crabnet_model.predict(
--> 524                 self.val_df, return_uncertainty=True, return_true=True
    525             )
    526         else:

[/usr/local/lib/python3.7/dist-packages/crabnet/crabnet_.py](https://localhost:8080/#) in predict(self, test_df, loader, return_uncertainty, return_true)
    484             else:
    485                 extra_features = None
--> 486             self.load_data(test_df, extra_features=extra_features)
    487             loader = self.data_loader
    488         elif test_df is not None and loader is not None:

[/usr/local/lib/python3.7/dist-packages/crabnet/crabnet_.py](https://localhost:8080/#) in load_data(self, data, extra_features, batch_size, train)
    578             inference=inference,
    579             verbose=self.verbose,
--> 580             elem_prop=self.elem_prop,
    581         )
    582         if self.verbose:

[/usr/local/lib/python3.7/dist-packages/crabnet/utils/utils.py](https://localhost:8080/#) in __init__(self, data, extra_features, batch_size, groupby, random_state, shuffle, pin_memory, n_elements, inference, verbose, elem_prop)
    580                 inference=inference,
    581                 verbose=verbose,
--> 582                 groupby=groupby,
    583             )
    584         )

[/usr/local/lib/python3.7/dist-packages/crabnet/utils/utils.py](https://localhost:8080/#) in get_edm(data, n_elements, inference, verbose, groupby)
    485 
    486     df.loc[:, "count"] = [
--> 487         len(_element_composition(form)) for form in df.formula.values.tolist()
    488     ]
    489     # df = df[df["count"] != 1]  # drop pure elements

[/usr/local/lib/python3.7/dist-packages/crabnet/utils/utils.py](https://localhost:8080/#) in <listcomp>(.0)
    485 
    486     df.loc[:, "count"] = [
--> 487         len(_element_composition(form)) for form in df.formula.values.tolist()
    488     ]
    489     # df = df[df["count"] != 1]  # drop pure elements

[/usr/local/lib/python3.7/dist-packages/crabnet/utils/composition.py](https://localhost:8080/#) in _element_composition(formula)
     83 
     84 def _element_composition(formula):
---> 85     elmap = parse_formula(formula)
     86     elamt = {}
     87     natoms = 0

[/usr/local/lib/python3.7/dist-packages/crabnet/utils/composition.py](https://localhost:8080/#) in parse_formula(formula)
     59         expanded_formula = formula.replace(m.group(), expanded_sym)
     60         return parse_formula(expanded_formula)
---> 61     sym_dict = get_sym_dict(formula, 1)
     62     return sym_dict
     63 

[/usr/local/lib/python3.7/dist-packages/crabnet/utils/composition.py](https://localhost:8080/#) in get_sym_dict(f, factor)
     25         f = f.replace(m.group(), "", 1)
     26     if f.strip():
---> 27         raise CompositionError(f"{f} is an invalid formula!")
     28     return sym_dict
     29 

CompositionError: e-05 is an invalid formula!
kyledmiller commented 2 months ago

Hi Sterling, just wanted to check in on this issue. If you happen to have solved it already, let me know, otherwise, I'll write up a fix and submit a PR in a few hours or so.