openpipelines-bio / openpipeline

https://openpipelines.bio
MIT License
25 stars 11 forks source link

Implement reference mapping components #138

Closed VladimirShitov closed 1 year ago

VladimirShitov commented 1 year ago
LuckyMD commented 1 year ago

PR concept feedback: User workflow 1: User has new lung data and wants to map to known reference -> HLCA Requirements:

User workflow 2: Build custom reference model, annotate it, map new data on top Requirements:

User workflow 3: User loads annotated data with same annotation schema, builds a new reference, maps new data on top Requirements:

LuckyMD commented 1 year ago

Workflows 1 and 2 are more important, workflow 3 has more challenging requirements (only met by CELLxGENE data corpus with cell types mapped to CL)

Note: CL is not a great representation of cell types

Other issue:

LuckyMD commented 1 year ago

Workflow 2 is missing:

LuckyMD commented 1 year ago

To do:

rcannood commented 1 year ago

@VladimirShitov Please let us know when this PR is ready for another review! :)

VladimirShitov commented 1 year ago

The repository is ready for review. I removed unnecessary details and fixed bugs. Local tests are running fine, but the test pipeline will fail due to the lack of test files. @rcannood , can you please run the script workflows/resources_test_scripts/hlca_reference_model.sh and put the resulting files on AWS so that test script could use them?

DriesSchaumont commented 1 year ago

LGTM! Very nice job @VladimirShitov, thanks!