laminlabs / lamindb

A data framework for biology.
https://docs.lamin.ai
Apache License 2.0
125 stars 10 forks source link

🚸 Shall we deprecate `run.reference` and `run.reference_type` in favor of `run.params`? #2026

Closed falexwolf closed 5 days ago

falexwolf commented 5 days ago

UX-wise, querying via params

ln.Run.params.get(nextflow_id="cheesy_engelbart")

seems better than querying via reference and reference_type

ln.Run.get(reference="cheesy_engelbart", reference_type="nextflow_id")

Currently, the information is duplicated and stored both in the parameter dictionary and the two simple fields: https://docs.lamin.ai/nextflow. Of course, that's bad/confusing.

On the hub, the nextflow_id appears to be another param, but in fact it isn't:

image

The Run interface isn't so terrible, but if we can ditch reference and reference_type then it'd be even simpler and the user could spend all their energy to learn params rather than learning reference/reference_type + params.

image

https://docs.lamin.ai/lamindb.track never supported reference and reference_type in the first place.

The crux with reference and reference_type is that often you have multiple references that you'd like to sync something with, e.g. nextflow_run_id, benchling_run_id, etc. -- and then one walks away confused. With params, that'd be very easy to deal with because we have key-value pairs in the first place.

What are the arguments against this?

WDYT @Zethson @sunnyosun?

We have reference and reference_type also on ULabel and Collection; both of which is problematic, too, but another discussion and doesn't need to be impacted by the discussion here.

falexwolf commented 5 days ago

@sunnyosun convinced me that we should have a dedicated spot for this syncing information, and the hub already expects this.

In most cases, one could simplify the above query to:

ln.Run.get(reference="cheesy_engelbart")

So, I guess it's all good.