Closed PoGibas closed 1 year ago
Hi @PoGibas, Thanks for your interest in OpenPipelines and reaching out! Unfortunatly, the script that you are trying to use are not really meant to be examples on how to use openpipelines per se. In fact, these are scripts that were used to generate test data for the automatic testing of our source code. Some of these scripts were created a long time ago (with older versions of components/workflows), executed once and the resulting data was stored on our S3 bucket. This should be documentated somewhere and we are currently working on improving the quality (and quantity) of our documentation. We should also take care in the future to document which versions of the component were used in these scripts (and/or dockerize them) so that we can make these scripts reproducable.
That being said, I believe that the issue you encountered is a bug (or at least a logic error) in the pipelines. The filter_with_hvg
component expects to retreive a column in the .obs
column of the MuData
input object that signifies the sample id. This column is added for the full_pipeline
(by the component add_id
), but not for the subpipelines like rna_multisample
. It seems that this column is missing, but it should have been added by a previous component. I will tag this as a bug and open a PR.
A workaround in the script would be:
# Add sample_id
python <<HEREDOC
import mudata as mu
h5mu_data = mu.read_h5mu("${OUT}_uss.h5mu")
h5mu_data.var['sample_id'] = 'pbmc_1k_protein_v3'
h5mu_data.write("${OUT}_uss_with_sample.h5mu")
HEREDOC
# run multisample
NXF_VER=21.10.6 nextflow \
run . \
-main-script workflows/multiomics/rna_multisample/main.nf \
-profile docker \
--id pbmc_1k_protein_v3_ums \
--input "${OUT}_uss_with_sample.h5mu" \
--output "`basename $OUT`_ums.h5mu" \
--publishDir `dirname $OUT` \
-resume
Please do not hesistate to reach out again if you need help setting up OpenPipelines for your usecase. In the meantime, have a look at the website to find the workflow that suits your needs.
Test script
pbmc_1k_protein_v3.sh
fails inrun multisample
part withKeyError: 'sample_id'
error.I was trying to run example test code -
bash workflows/resources_test_scripts/pbmc_1k_protein_v3.sh
. Which executed download/Convert h5mu to h5ad
/run single sample
parts, but fails atfilter_with_hvg_process
part.Bellow is output before the failure:
It seems that errors comes from the
obs_batch_key
optional_parameter that gets passed toscanpy
:What could be a solution for this issue? Could it be related to scanpy issues (https://github.com/scverse/scanpy/issues/2396)?
Full output is: