mapping-commons / sssom-py

Python toolkit for SSSOM mapping format
https://mapping-commons.github.io/sssom-py/index.html#
MIT License
49 stars 12 forks source link

Decouple most methods from from LinkML dataclasses, split sssom into multiple packages #464

Closed matentzn closed 9 months ago

matentzn commented 9 months ago

Right now, we have the odd situation of needing pandas and linkml for parsing a dataframe.

There is no inherent need for that;

  1. No methods other than convert (which relies on LinkML for the translations into JSON and YAML etc) and validate (which relies on linkml during validation) really needs LinkML data classes (pydantic or otherwise). We could contemplate to get rid of the "LinkML part" for these other methods.
  2. LinkML convert and validate do not require pandas technically speaking. It may be worth exploring more efficient means of parsing data frames into dataclasses (at least a proposal here in the issue we can discuss).

This separation could also be grounds into splitting the project into sssom-transform (validate and convert, dependent on linkml), sssom-ext (anything not in the other categories, including query stuff) and sssom-developer (everything needed for sssom file maintenance).

We should still release the whole sssom with everything in it though (as is), wrapping the above.

Its not great to separate packages by heavyweight dependencies, but I had been getting complains about the huge number of dependencies on sssom toolkit, this could help reducing it for some users.

cthoyt commented 9 months ago

no please please don't split the project up. All of these other modern projects with 1million subpackages are totally impossible to navigate. We just have to better organize what we have, and hide imports from stuff we don't want to always be around

matentzn commented 9 months ago

We just have to better organize what we have

How do we avoid massive dependencies when they are not needed? Making them optional and telling users they need to install them if they want to use this and that functionality?

cthoyt commented 9 months ago

See https://github.com/mapping-commons/sssom-py/pull/467

The rest of the dependencies seem pretty reasonable. Most people have pandas/numpy and this isn't a big ask to get it around. I think having RDFlib is also pretty standard. Then there are some low-level things like validators, pyyaml, click, and deprecation which are reasonable. curies is a core component for anyone in semantic world.

One question would be is it possible to make LinkML an optional dependency, since it's the cause of most of the heaviness

matentzn commented 9 months ago

Alright I will go with your recommendation for now and see what do do about LinkML separately. For now, I want to try updating sssompy to pydantic data classes and see how much that breaks

cthoyt commented 9 months ago

@matentzn can linkml generate Pydantic classes that don't have all of the baggage of the yaml system?

matentzn commented 9 months ago

I think so, but I am far out of my depth here.