🚸 Shall we deprecate `run.reference` and `run.reference_type` in favor of `run.params`?

UX-wise, querying via params

ln.Run.params.get(nextflow_id="cheesy_engelbart")

seems better than querying via reference and reference_type

ln.Run.get(reference="cheesy_engelbart", reference_type="nextflow_id")

Currently, the information is duplicated and stored both in the parameter dictionary and the two simple fields: https://docs.lamin.ai/nextflow. Of course, that's bad/confusing.

On the hub, the nextflow_id appears to be another param, but in fact it isn't:

The Run interface isn't so terrible, but if we can ditch reference and reference_type then it'd be even simpler and the user could spend all their energy to learn params rather than learning reference/reference_type + params.

https://docs.lamin.ai/lamindb.track never supported reference and reference_type in the first place.

The crux with reference and reference_type is that often you have multiple references that you'd like to sync something with, e.g. nextflow_run_id, benchling_run_id, etc. -- and then one walks away confused. With params, that'd be very easy to deal with because we have key-value pairs in the first place.

What are the arguments against this?

Semantics: reference might be distinct enough from a "run parameter" to not be considered a parameter.
Performance: Querying the indexed reference field is more performant than querying the JSON field.

WDYT @Zethson @sunnyosun?

We have reference and reference_type also on ULabel and Collection; both of which is problematic, too, but another discussion and doesn't need to be impacted by the discussion here.

laminlabs / lamindb

🚸 Shall we deprecate `run.reference` and `run.reference_type` in favor of `run.params`? #2026