Closed VladimirShitov closed 8 months ago
Hey @VladimirShitov!
In order to fully wrap around what functionality is being added, could you:
src/label_transfer/README.md
?- type: "file"
name: "--input_train"
description: "The training data"
info:
short_description: "Training data"
slots:
layers:
- type: "integer"
name: "counts"
description: "Raw counts"
- type: "double"
name: "normalized"
description: "Normalized counts"
obs:
- type: "string"
name: "label"
description: "Ground truth cell type labels"
- type: "string"
name: "batch"
description: "Batch information"
var:
- type: "boolean"
name: "hvg"
description: "Whether or not the feature is considered to be a 'highly variable\
\ gene'"
required: true
- type: "integer"
name: "hvg_score"
description: "A ranking of the features by hvg."
required: true
obsm:
- type: "double"
name: "X_pca"
description: "The resulting PCA embedding."
required: true
uns:
- type: "string"
name: "dataset_id"
description: "A unique identifier for the dataset"
required: true
- type: "string"
name: "normalization_id"
description: "Which normalization was used"
required: true
Is such schema a requirement or a recommendation? E.g. will it be used for validation?
This component is pretty flexible: you can run it with different .obs
columns and layers for training if you correctly specify arguments. So I can provide an example schema but wouldn't recommend following it strictly (e.g. always name column for labels transfer "label")
Is such schema a requirement or a recommendation? E.g. will it be used for validation?
At the moment it's will be used for documentation.
However, in OpenProblems it's also used for automating some of the unit tests, in that I implemented a test which will run whatever executable it gets and checks whether the output files contain the right slots. It might be something that we could also look into for OpenPipelines.
Hi Vladimir,
I would like to see this PR merged. What do you think needs to happen before we can merge it? :)
Robrecht
@rcannood I added schemas and answered your comments. Thank you very much for the feedback! Could you check PR again?
Especially I am concerned if I described the .h5mu
files format correctly in schemas.
I'm taking a look at prepping this branch to be merged into main.
So far, I updated the file format yamls and moved them to a separate API folder. I added 'merge' statements in the components to make them use the file format specs defined earlier.
@rcannood What is the state of this pull request ATM?
Thank you @rcannood ! Looks great to me. Regarding the first 4 points, I completely agree. For the last one, I support your reasoning, but in some cases, people only use .X
to store the embedding. This is the case for HLCA, which we use as an example and test.
Alternatively, we can have 2 arguments, e.g., use_X
and reference_obsm_features
. If the first one is present or the second one is absent, .X
is used
Ok! I find it quite strange to store an embedding in X, but let's roll with it.
I refactored the tests a bit.
Merging when the CI tests succeed!
:partying_face: