openforcefield / openff-evaluator

A physical property evaluation toolkit from the Open Forcefield Consortium.
https://docs.openforcefield.org/projects/evaluator
MIT License
55 stars 18 forks source link

Identifiers not being parsed through evaluator #553

Closed barmoral closed 1 month ago

barmoral commented 6 months ago

Describe the bug I'm trying to use evaluator to filter papers with Osmotic Coefficient values from ThermoML. I've succesfully created the property type, filtered out dois with osmotic coefficients, converted them to a pandas dataframe, and printed the dataframe into a csv file. However, evaluator is not recognizing or reading all of the substances involved from the papers. It only recognizes one component, even if the thermoml .xml data does report other identifiers (StandardInChI, CommonName).

To Reproduce

Register Custom ThermoML Property:

@thermoml_property("Osmotic coefficient", supported_phases=PropertyPhase.Liquid | PropertyPhase.Gas)
class OsmoticCoefficient(PhysicalProperty):
    """A class representation of a osmotic coeff property"""
    @classmethod
    def default_unit(cls):
        return unit.dimensionless
setattr(properties, OsmoticCoefficient.__name__, OsmoticCoefficient)

Load ThermoML Data Set: ds = ThermoMLDataSet.from_doi('10.1016/j.fluid.2006.09.025')

Write to csv:

ds_osm=ds.to_pandas()
ds_osm.to_csv("filt_ds_osmcoeff.csv")

Check involved compounds: ds.substances

If the problem involves a specific molecule or file, please upload that as well. --> filt_ds_osmcoeff.csv

Output command "ds.substances" outputs "{<Substance O{solv}{x=1.000000}>}" Here is link to the ThermoML report of this specific example paper proving there are more: https://trc.nist.gov/ThermoML/10.1016/j.fluid.2006.09.025.html

Computing environment (please complete the following information):

``` _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge anyio 4.2.0 pyhd8ed1ab_0 conda-forge argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py310h2372a71_4 conda-forge arrow 1.3.0 pyhd8ed1ab_0 conda-forge asttokens 2.4.1 pyhd8ed1ab_0 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge async-lru 2.0.4 pyhd8ed1ab_0 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge aws-c-auth 0.7.11 h0100c56_0 conda-forge aws-c-cal 0.6.9 h5d48c4d_2 conda-forge aws-c-common 0.9.10 hd590300_0 conda-forge aws-c-compression 0.2.17 h7f92143_7 conda-forge aws-c-event-stream 0.4.1 h0bcb0bb_1 conda-forge aws-c-http 0.8.0 hd268abd_1 conda-forge aws-c-io 0.13.36 hb3b01f7_3 conda-forge aws-c-mqtt 0.10.0 hf5d392a_2 conda-forge aws-c-s3 0.4.7 hf8c57b3_3 conda-forge aws-c-sdkutils 0.1.13 h7f92143_0 conda-forge aws-checksums 0.1.17 h7f92143_6 conda-forge aws-crt-cpp 0.26.0 h600aa22_5 conda-forge aws-sdk-cpp 1.11.210 h405b101_9 conda-forge babel 2.14.0 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.12.3 pyha770c72_0 conda-forge bleach 6.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.5 h0f2a231_0 conda-forge bokeh 3.3.3 pyhd8ed1ab_0 conda-forge boltons 23.1.1 pyhd8ed1ab_0 conda-forge brotli 1.1.0 hd590300_1 conda-forge brotli-bin 1.1.0 hd590300_1 conda-forge brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge bson 0.5.9 py_0 conda-forge bzip2 1.0.8 hd590300_5 conda-forge c-ares 1.25.0 hd590300_0 conda-forge ca-certificates 2024.2.2 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.3.2 pyhd8ed1ab_0 conda-forge cairo 1.18.0 h3faef2a_0 conda-forge certifi 2024.2.2 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py310h2fee648_0 conda-forge cftime 1.6.3 py310h1f7b6fc_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge comm 0.2.1 pyhd8ed1ab_0 conda-forge contourpy 1.2.0 py310hd41b1e2_0 conda-forge cudatoolkit 11.8.0 h4ba93d1_12 conda-forge curl 8.5.0 hca28451_0 conda-forge cycler 0.12.1 pyhd8ed1ab_0 conda-forge cytoolz 0.12.2 py310h2372a71_1 conda-forge dask 2023.12.1 pyhd8ed1ab_0 conda-forge dask-core 2023.12.1 pyhd8ed1ab_0 conda-forge dask-jobqueue 0.8.2 pyhd8ed1ab_0 conda-forge debugpy 1.8.0 py310hc6cd4ac_1 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2023.12.1 pyhd8ed1ab_0 conda-forge ele 0.2.0 pyhd8ed1ab_0 conda-forge entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.0 pyhd8ed1ab_0 conda-forge executing 2.0.1 pyhd8ed1ab_0 conda-forge expat 2.5.0 hcb278e6_1 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_1 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.47.0 py310h2372a71_0 conda-forge forcefield-utilities 0.2.2 pyhd8ed1ab_0 conda-forge foyer 0.12.0 pyhd8ed1ab_0 conda-forge fqdn 1.5.1 pyhd8ed1ab_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge fsspec 2023.12.2 pyhca7485f_0 conda-forge future 0.18.3 pyhd8ed1ab_0 conda-forge gettext 0.21.1 h27087fc_0 conda-forge gf2x 1.3.0 ha476b99_2 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge glog 0.6.0 h6f12383_0 conda-forge gmp 6.3.0 h59595ed_0 conda-forge gmpy2 2.1.2 py310h3ec546c_1 conda-forge gmso 0.11.2 pyhd8ed1ab_0 conda-forge greenlet 3.0.3 py310hc6cd4ac_0 conda-forge hdf4 4.2.15 h9772cbc_5 conda-forge hdf5 1.12.1 nompi_h4df4325_104 conda-forge icu 73.2 h59595ed_0 conda-forge idna 3.6 pyhd8ed1ab_0 conda-forge importlib-metadata 7.0.1 pyha770c72_0 conda-forge importlib_metadata 7.0.1 hd8ed1ab_0 conda-forge importlib_resources 6.1.1 pyhd8ed1ab_0 conda-forge iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge ipykernel 6.28.0 pyhd33586a_0 conda-forge ipython 8.20.0 pyh707e725_0 conda-forge ipywidgets 8.1.1 pyhd8ed1ab_0 conda-forge isoduration 20.11.0 pyhd8ed1ab_0 conda-forge jedi 0.19.1 pyhd8ed1ab_0 conda-forge jinja2 3.1.2 pyhd8ed1ab_1 conda-forge jpeg 9e h0b41bf4_3 conda-forge json5 0.9.14 pyhd8ed1ab_0 conda-forge jsonpointer 2.4 py310hff52083_3 conda-forge jsonschema 4.21.1 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2023.12.1 pyhd8ed1ab_0 conda-forge jsonschema-with-format-nongpl 4.21.1 pyhd8ed1ab_0 conda-forge jupyter-lsp 2.2.2 pyhd8ed1ab_0 conda-forge jupyter_client 7.4.9 pyhd8ed1ab_0 conda-forge jupyter_core 5.7.1 py310hff52083_0 conda-forge jupyter_events 0.9.0 pyhd8ed1ab_0 conda-forge jupyter_server 2.12.5 pyhd8ed1ab_0 conda-forge jupyter_server_terminals 0.5.2 pyhd8ed1ab_0 conda-forge jupyterlab 4.0.11 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.3.0 pyhd8ed1ab_0 conda-forge jupyterlab_server 2.25.2 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 3.0.9 pyhd8ed1ab_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.5 py310hd41b1e2_1 conda-forge krb5 1.21.2 h659d440_0 conda-forge lark-parser 0.12.0 pyhd8ed1ab_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libabseil 20230802.1 cxx17_h59595ed_0 conda-forge libarrow 14.0.2 h84dd17c_2_cpu conda-forge libarrow-acero 14.0.2 h59595ed_2_cpu conda-forge libarrow-dataset 14.0.2 h59595ed_2_cpu conda-forge libarrow-flight 14.0.2 h120cb0d_2_cpu conda-forge libarrow-flight-sql 14.0.2 h61ff412_2_cpu conda-forge libarrow-gandiva 14.0.2 hacb8726_2_cpu conda-forge libarrow-substrait 14.0.2 h61ff412_2_cpu conda-forge libblas 3.9.0 20_linux64_openblas conda-forge libboost 1.82.0 h6fcfa73_6 conda-forge libboost-python 1.82.0 py310hcb52e73_6 conda-forge libbrotlicommon 1.1.0 hd590300_1 conda-forge libbrotlidec 1.1.0 hd590300_1 conda-forge libbrotlienc 1.1.0 hd590300_1 conda-forge libcblas 3.9.0 20_linux64_openblas conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcurl 8.5.0 hca28451_0 conda-forge libdeflate 1.10 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.5.0 hcb278e6_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libflint 2.9.0 h2f819a4_ntl_100 conda-forge libgcc-ng 13.2.0 h807b86a_3 conda-forge libgfortran-ng 13.2.0 h69a702a_3 conda-forge libgfortran5 13.2.0 ha4646dd_3 conda-forge libglib 2.78.3 h783c2da_0 conda-forge libgomp 13.2.0 h807b86a_3 conda-forge libgoogle-cloud 2.12.0 h5206363_4 conda-forge libgrpc 1.59.3 hd6c4280_0 conda-forge libiconv 1.17 hd590300_2 conda-forge liblapack 3.9.0 20_linux64_openblas conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge libllvm15 15.0.7 hb3ce162_4 conda-forge libnetcdf 4.8.1 nompi_h329d8a1_102 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnl 3.9.0 hd590300_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libnuma 2.0.16 h0b41bf4_1 conda-forge libopenblas 0.3.25 pthreads_h413a1c8_0 conda-forge libparquet 14.0.2 h352af49_2_cpu conda-forge libpng 1.6.39 h753d276_0 conda-forge libprotobuf 4.24.4 hf27288f_0 conda-forge libre2-11 2023.06.02 h7a70373_0 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libsqlite 3.44.2 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge libthrift 0.19.0 hb90f79a_1 conda-forge libtiff 4.3.0 h0fcbabc_4 conda-forge libutf8proc 2.8.0 h166bdaf_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libwebp-base 1.3.2 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.12.3 h232c23b_0 conda-forge libxslt 1.1.39 h76b75d6_0 conda-forge libzip 1.10.1 h2629f0a_3 conda-forge libzlib 1.2.13 hd590300_5 conda-forge llvmlite 0.41.1 py310h1b8f574_0 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge lxml 5.1.0 py310hcfd0673_0 conda-forge lz4 4.3.3 py310h350c4a5_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 h516909a_1000 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.3 py310h2372a71_1 conda-forge matplotlib-base 3.8.2 py310h62c0568_0 conda-forge matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mdtraj 1.9.9 py310h523e8d7_1 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge mistune 3.0.2 pyhd8ed1ab_0 conda-forge mpc 1.3.1 hfe3b2da_0 conda-forge mpfr 4.2.1 h9458935_0 conda-forge mpiplus v0.0.2 pyhd8ed1ab_0 conda-forge mpmath 1.3.0 pyhd8ed1ab_0 conda-forge msgpack-python 1.0.7 py310hd41b1e2_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge nbclient 0.8.0 pyhd8ed1ab_0 conda-forge nbconvert-core 7.14.2 pyhd8ed1ab_0 conda-forge nbformat 5.9.2 pyhd8ed1ab_0 conda-forge ncurses 6.4 h59595ed_2 conda-forge nest-asyncio 1.5.8 pyhd8ed1ab_0 conda-forge netcdf4 1.5.8 nompi_py310hd7ca5b8_101 conda-forge networkx 3.2.1 pyhd8ed1ab_0 conda-forge nomkl 1.0 h5ca1d4c_0 conda-forge nose 1.3.7 py_1006 conda-forge notebook 7.0.7 pyhd8ed1ab_0 conda-forge notebook-shim 0.2.3 pyhd8ed1ab_0 conda-forge ntl 11.4.3 hef3c4d3_1 conda-forge numba 0.58.1 py310h7dc5dd1_0 conda-forge numexpr 2.8.8 py310hc2d3c2e_100 conda-forge numpy 1.26.3 py310hb13e2d6_0 conda-forge ocl-icd 2.3.1 h7f98852_0 conda-forge ocl-icd-system 1.0.0 1 conda-forge olefile 0.47 pyhd8ed1ab_0 conda-forge openeye-toolkits 2023.2.3 py310_0 openeye openff-amber-ff-ports 0.0.4 pyhca7485f_0 conda-forge openff-evaluator 0.4.7 pyhd8ed1ab_0 conda-forge openff-evaluator-base 0.4.7 pyhd8ed1ab_0 conda-forge openff-forcefields 2023.11.0 pyhca7485f_0 conda-forge openff-interchange-base 0.3.18 pyhd8ed1ab_0 conda-forge openff-models 0.1.1 pyhca7485f_0 conda-forge openff-toolkit-base 0.14.3 pyhd8ed1ab_0 conda-forge openff-units 0.2.0 pyh1a96a4e_0 conda-forge openff-utilities 0.1.12 pyhd8ed1ab_0 conda-forge openjpeg 2.5.0 h7d73246_0 conda-forge openmm 8.1.0 py310h52c1345_1 conda-forge openmmtools 0.21.5 pyhd8ed1ab_1 conda-forge openssl 3.2.1 hd590300_1 conda-forge orc 1.9.2 h4b38347_0 conda-forge overrides 7.6.0 pyhd8ed1ab_0 conda-forge packaging 23.2 pyhd8ed1ab_0 conda-forge packmol 20.010 h86c2bf4_0 conda-forge pandas 1.5.3 py310h9b08913_1 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge parmed 4.2.2 py310hc6cd4ac_1 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.4.1 pyhd8ed1ab_0 conda-forge pcre2 10.42 hcad00b1_0 conda-forge pdbfixer 1.9 pyh1a96a4e_0 conda-forge pexpect 4.8.0 pyh1a96a4e_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.4.0 py310h07f4688_0 conda-forge pint 0.20.1 pyhd8ed1ab_0 conda-forge pip 23.3.2 pyhd8ed1ab_0 conda-forge pixman 0.43.0 h59595ed_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 4.1.0 pyhd8ed1ab_0 conda-forge pluggy 1.4.0 pyhd8ed1ab_0 conda-forge prometheus_client 0.19.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.42 pyha770c72_0 conda-forge protobuf 4.24.4 py310h620c231_0 conda-forge psutil 5.9.7 py310h2372a71_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pyarrow 14.0.2 py310hf9e7431_2_cpu conda-forge pyarrow-hotfix 0.6 pyhd8ed1ab_0 conda-forge pycairo 1.25.1 py310hda9f760_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pydantic 1.10.13 py310h2372a71_1 conda-forge pygments 2.17.2 pyhd8ed1ab_0 conda-forge pymbar 3.1.1 py310hde88566_2 conda-forge pyparsing 3.1.1 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pytables 3.7.0 py310hf5df6ce_0 conda-forge pytest 8.1.1 pyhd8ed1ab_0 conda-forge python 3.10.13 hd12c33a_1_cpython conda-forge python-constraint 1.4.0 py_0 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-fastjsonschema 2.19.1 pyhd8ed1ab_0 conda-forge python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge python-symengine 0.11.0 py310h04af605_1 conda-forge python_abi 3.10 4_cp310 conda-forge pytz 2023.3.post1 pyhd8ed1ab_0 conda-forge pyyaml 6.0.1 py310h2372a71_1 conda-forge pyzmq 24.0.1 py310h330234f_1 conda-forge rdkit 2023.09.4 py310hb79e901_0 conda-forge rdma-core 49.0 hd3aeb46_2 conda-forge re2 2023.06.02 h2873b5e_0 conda-forge readline 8.2 h8228510_1 conda-forge referencing 0.32.1 pyhd8ed1ab_0 conda-forge reportlab 3.5.68 py310h94fcab3_1 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge rich 13.7.0 pyhd8ed1ab_0 conda-forge rpds-py 0.17.1 py310hcb5633a_0 conda-forge s2n 1.4.1 h06160fa_0 conda-forge scipy 1.11.4 py310hb13e2d6_0 conda-forge send2trash 1.8.2 pyh41d4057_0 conda-forge setuptools 69.0.3 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge smirnoff99frosst 1.1.0 pyh44b312d_0 conda-forge snappy 1.1.10 h9fff704_0 conda-forge sniffio 1.3.0 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge sqlalchemy 2.0.25 py310h2372a71_0 conda-forge stack_data 0.6.2 pyhd8ed1ab_0 conda-forge symengine 0.11.2 hb29318e_0 conda-forge sympy 1.12 pypyh9d50eac_103 conda-forge tblib 3.0.0 pyhd8ed1ab_0 conda-forge terminado 0.18.0 pyh0d859eb_0 conda-forge tinycss2 1.2.1 pyhd8ed1ab_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge toolz 0.12.0 pyhd8ed1ab_0 conda-forge tornado 6.3.3 py310h2372a71_1 conda-forge traitlets 5.14.1 pyhd8ed1ab_0 conda-forge types-python-dateutil 2.8.19.20240106 pyhd8ed1ab_0 conda-forge typing-extensions 4.9.0 hd8ed1ab_0 conda-forge typing_extensions 4.9.0 pyha770c72_0 conda-forge typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge tzdata 2023d h0c530f3_0 conda-forge ucx 1.15.0 h75e419f_2 conda-forge uncertainties 3.1.7 pyhd8ed1ab_0 conda-forge unicodedata2 15.1.0 py310h2372a71_0 conda-forge unyt 2.9.2 pyhd8ed1ab_1 conda-forge uri-template 1.3.0 pyhd8ed1ab_0 conda-forge urllib3 2.1.0 pyhd8ed1ab_0 conda-forge wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge webcolors 1.13 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_2 conda-forge websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge wheel 0.42.0 pyhd8ed1ab_0 conda-forge widgetsnbextension 4.0.9 pyhd8ed1ab_0 conda-forge xmltodict 0.13.0 pyhd8ed1ab_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.7 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xyzservices 2023.10.1 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zeromq 4.3.5 h59595ed_0 conda-forge zict 3.0.0 pyhd8ed1ab_0 conda-forge zipp 3.17.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 hd590300_5 conda-forge zstd 1.5.5 hfc55251_0 conda-forge ```

Additional context I believe the problem is that the classmethod "from_xml_node" in the thermoml.py is not correctly identifying the xml identifiers so it cannot convert StandardInChI to smiles, for example.

mattwthompson commented 6 months ago

I can reproduce this; there must be something different about this dataset that causes the parsing to fail in ways that the other supported properties do not. Or maybe it's not correctly being loaded as a plugin

mattwthompson commented 4 months ago

Whatever's going wrong is surfacing from here: https://github.com/openforcefield/openff-evaluator/blob/ca084dfa9f1d6531f1dac5d92124b15429429449/openff/evaluator/datasets/thermoml/thermoml.py#L2188

It can't be that all identifiers are missed, otherwise it wouldn't think everything was pure water

Here's the script I'm using to test, based on what you shared:


from openff.units import unit

from openff.evaluator.datasets import PhysicalProperty, PropertyPhase
from openff.evaluator.datasets.thermoml import thermoml_property
from openff.evaluator.datasets.thermoml.thermoml import ThermoMLDataSet
from openff.evaluator.plugins import register_default_plugins, register_external_plugins

@thermoml_property(
    "Osmotic coefficient",
    supported_phases=PropertyPhase.Liquid | PropertyPhase.Gas,
)
class OsmoticCoefficient(PhysicalProperty):
    def default_unit(cls):
        return unit.dimensionless

register_default_plugins()
register_external_plugins()

ThermoMLDataSet._from_url(
    "https://trc.nist.gov/ThermoML/10.1016/j.fluid.2006.09.025.xml"
)
lilyminium commented 1 month ago

I traced this ultimately back to an incorrect calculation of MW, meaning that the non-O compound gets dropped at the lines below due to an apparent mole fraction around 1e-27. Raising #569 to fix.

https://github.com/openforcefield/openff-evaluator/blob/c8c476ab6ec1e379049384ce23982e547d699f4b/openff/evaluator/datasets/thermoml/thermoml.py#L1398-L1408

mattwthompson commented 1 month ago

With the current development head (which would land in 0.4.10, most likely) including @lilyminium's recent fix, I think this is doing what one would expect? I blindly copied my code snippet from earlier

In [26]: df = ds.to_pandas()

In [27]: df.describe()
Out[27]:
       Temperature (K)  N Components  Mole Fraction 1  Mole Fraction 2  OsmoticCoefficient Value ()  OsmoticCoefficient Uncertainty ()
count           241.00         241.0       241.000000       241.000000                   241.000000                         241.000000
mean            298.15           2.0         0.011742         0.988258                     0.651477                           0.008793
std               0.00           0.0         0.010405         0.010405                     0.211759                           0.005274
min             298.15           2.0         0.000855         0.948725                     0.219100                           0.000550
25%             298.15           2.0         0.003139         0.982043                     0.530000                           0.004300
50%             298.15           2.0         0.008380         0.991620                     0.662500                           0.008450
75%             298.15           2.0         0.017957         0.996861                     0.833900                           0.011900
max             298.15           2.0         0.051275         0.999145                     0.977700                           0.019500

In [28]: df.head()
Out[28]:
                                 Id  Temperature (K) Pressure (kPa)         Phase  N Components  ... Mole Fraction 2 Exact Amount 2  OsmoticCoefficient Value () OsmoticCoefficient Uncertainty ()                       Source
0  c2e7b442254f4541b41b0869241d66b1           298.15           None  Liquid + Gas             2  ...        0.999140           None                       0.7389                           0.00655  10.1016/j.fluid.2006.09.025
1  befcc793e1054dd38b5df717d6603b95           298.15           None  Liquid + Gas             2  ...        0.998963           None                       0.7142                           0.00715  10.1016/j.fluid.2006.09.025
2  8768e8a84b6d4267b4f884d95fbece95           298.15           None  Liquid + Gas             2  ...        0.998622           None                       0.6730                           0.00820  10.1016/j.fluid.2006.09.025
3  e2acf2ede41b444ea445e66b5ebb5f83           298.15           None  Liquid + Gas             2  ...        0.998378           None                       0.6485                           0.00880  10.1016/j.fluid.2006.09.025
4  d8c3e030b0ff49baad2ebcb2c62444a6           298.15           None  Liquid + Gas             2  ...        0.998211           None                       0.6324                           0.00925  10.1016/j.fluid.2006.09.025

[5 rows x 16 columns]

In [29]: df['Component 1']
Out[29]:
0            CC[N+](C)(CC)CC.[I-]
1            CC[N+](C)(CC)CC.[I-]
2            CC[N+](C)(CC)CC.[I-]
3            CC[N+](C)(CC)CC.[I-]
4            CC[N+](C)(CC)CC.[I-]
                  ...
236    CCCCCCC[N+](CC)(CC)CC.[I-]
237    CCCCCCC[N+](CC)(CC)CC.[I-]
238    CCCCCCC[N+](CC)(CC)CC.[I-]
239    CCCCCCC[N+](CC)(CC)CC.[I-]
240    CCCCCCC[N+](CC)(CC)CC.[I-]
Name: Component 1, Length: 241, dtype: object

I haven't worked with this data, but I see