openpipelines-bio / openpipeline

https://openpipelines.bio
MIT License
25 stars 11 forks source link

Refactor the TotalVI component to also allow running without reference #546

Open ddemaeyer opened 8 months ago

ddemaeyer commented 8 months ago

Currently the TotalVI component/pipeline requires existing reference to be loaded. We should extend the behavior to only also allow running without supplied reference.

VladimirShitov commented 6 months ago

You mean integration of the given data and creating the reference? Sure, I can do that. Maybe other integration components lack this regime as well

ddemaeyer commented 6 months ago

@VladimirShitov indeed it is for creating the reference, identical as we do for scvi. In addition to this, we talked last week about moving the different components for reference building and/or integration into the integrate namespace. While we would move all the reference mapping to the reference_mapping workspace.

VladimirShitov commented 5 months ago

@ddemaeyer , you mentioned that the totalVI component is already used by someone? Should I keep the code backward compatible then and not change API?

I see 2 options now:

  1. Move the existing component to the reference_mapping workspace, and implement a new component in the integrate for the integration without a query. Pros: This would provide a more convenient interface for users. Cons: This can lead to partial code repetition.
  2. Have one component for everything. If the query is provided, it will map it to a reference (building the model for it if needed). If there is no query, only reference integration will be performed. Pros: The code will be in one place, so there will be less repetition and bugs potential. Cons: The current API will then be very unintuitive, and we will need to change it. Currently, --input means query, and to run the integration, a user would need to provide only --reference and empty --input argument.

I am in favor of option 1.

ddemaeyer commented 5 months ago

Hey @VladimirShitov, the component is not used at the moment. This since there is a reference required to use it, which is quite difficult today since you can't make one using OP. So the API can be changed, but it would be good to retain the current compoentn in reference_mapping namespace and move the new one into the integration namespace.

Hence, I completely agree with option 1.

VladimirShitov commented 5 months ago

Thanks!