laminlabs / laminr

R client for LaminDB.
http://laminr.lamin.ai/
Apache License 2.0
8 stars 1 forks source link

Keep data lineage when getting data from other instances #98

Open Zethson opened 18 hours ago

Zethson commented 18 hours ago

We observed that when data from any non-default instance such as cellxgene is used as input during a Run, it does not show up in the linage graph.

@lazappi had suggested that this is probably because of our usage of the API to get the data and not the Python code which probably does more magic.

falexwolf commented 17 hours ago

The problem is that reticulate isn't used for .load() and .cache().

If reticulate was used, this would all be resolved.

Hence, the fix should be using reticulate for these two methods and it's going to work.

lazappi commented 6 hours ago

I think we should have a discussion about whether it is worth using the API at all. We have about reached the limit of what the API can do currently and if we have to use {reticulate} for some things anyway, maybe it's better to use it for everything? It would mean a fairly big refactor but after that development might be quicker.

falexwolf commented 4 hours ago

Yes, we should have that discussion next week or so, I agree.

But what's indeed much better with the API is that you're not relying on 20 Python packages that Django needs to map all the schema modules.

So, it's not a clear decision pro reticulate for querying say bionty. That's likely more elegant through the REST API.