Closed janosh closed 3 years ago
Hey @janosh
Currently the full data is not available through matminer, though if @tschaume wants to make a matminer-loadable static .json.gz of it available, I'd be glad to add it to matminer.
There is an abbreviated version of it: https://hackingmaterials.lbl.gov/matminer/dataset_summary.html, boltztrap_mp
available in matminer. The following columns are available:
@ardunn Thanks for the quick reply! Do you have any information on how the 8,924 entries were selected from the 44,333 listed in the full dataset at https://contribs.materialsproject.org/projects/carrier_transport?
I believe the ~9k entries were from a previous run
On Thu, Apr 1, 2021 at 11:06 PM Janosh Riebesell @.***> wrote:
@ardunn https://github.com/ardunn Thanks for the quick reply! Do you have any information on how the 8,924 entries were selected from the 44,333 listed in the full dataset at https://contribs.materialsproject.org/projects/carrier_transport?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/hackingmaterials/matminer/issues/606#issuecomment-812339601, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYDHS76QUQV6XOUZHFRYJDTGVNGRANCNFSM42HUAJSQ .
@janosh @ardunn I do have different versions of a potential .json.gz
files we could use to link the full dataset up to matminer. I'll make them available at a persistent link in MPContribs and report back here by Monday (hopefully).
@janosh @ardunn There's a JSON file for download now at https://contribs.materialsproject.org/projects/carrier_transport.json.gz (12.5MB). It reflects the format of the contributions as they go into the MPContribs API and does not include the temperature- and doping-level dependent tables. Happy to iterate if it isn't a suitable format to link up to matminer. FYI @fraricci
Thanks a lot @tschaume! 👍
I'm guessing for addition to matminer
it should be in a format ready for data mining. So probably not have dtype
object
(i.e. strings) for target columns but floats.
Here's a version of the dataset as we would use it with models like CGCNN: https://github.com/janosh/matbench/commit/df3831319599b9aa3768dd5f97fdac5ab94bdc37.
What's the meaning of .v
in these columns?
Sᵉ.p.v [µV/K]
Sᵉ.n.v [µV/K]
σᵉ.p.v [1/Ω/m/s]
σᵉ.n.v [1/Ω/m/s]
PFᵉ.p.v [µW/cm/K²/s]
PFᵉ.n.v [µW/cm/K²/s]
κₑᵉ.p.v [W/K/m/s]
κₑᵉ.n.v [W/K/m/s]
Ah. From here:
Value (v), temperature (T), and doping level (c) at the maximum of the average eigenvalue of the Seebeck coefficient
Thanks @janosh and @tschaume . I will add these to the metadata at the same time that I add Ryan Kingsbury's updated expt_gaps and _formation_enthalpy datasets. The columns will be casted to the correct dtypes before uploading as well.
@janosh @tschaume I wound up using the carrier_transport_with_strucs.json.gz
that @janosh referenced earlier. Unfortunately the file currently hosted on mpcontribs has a pesky data
column which is not super easy to use, so the raw json.gz has been uploaded to figshare (https://figshare.com/articles/dataset/ricci_boltztrap_mp_tabular/14701110) in the meantime.
The *_strucs.json.gz
needed some minor adjustments.
Notable additions to metadata beyond what was in MPContribs:
If there is any major problems with hosting this data temporarily on figshare lmk and it will be removed immediately. Obviously the best scenario is if the matminer-compatible .json.gz is hosted on MPContribs. If there is no major problem keeping this file on Figshare in the interim it will remain there until MPContribs has a serviceable link to the matminer-compatible .json.gz. Let me know if/when that is done and I will update the matminer link.
all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0).
@ardunn Oops! I wasn't using those columns but very good thing you noticed. Thanks for making the data easily available through matminer
! 😅
Is the MPContrib Electronic Transport dataset available via
matminer
?This
prints
So I'm guessing not? If so, curious to know why.
Also, I'd like to suggest adding a short code block to each MPContrib detail page showing how to download it. E.g.