Open LuckyMD opened 3 years ago
I've ran into this before (sc.read_10x_mtx()
has the same default of course). One possible issue with defaulting to gex_only=False
is that someone might accidentally run a 'regular pipeline' with multi-modal data, e.g. log-normalizing RNA+protein+cell hashing counts together without first subsetting the adata based on .var["feature_types"]
. By contrast, anyone who know they have multi-modal data would hopefully notice the missing the data with gex_only=True
. Either way, logging warnings sounds good.
For context, the option was added in #334, and I think the scope for other feature types was much more limited at the time.
Would it already be worth either making gex_only a required input?
I'm not sure the gex_only
argument even entirely makes sense anymore. I think a feature_type
argument would make more sense. Erroring if nothing is passed and there are multiple kinds sounds reasonable to me, as multimodality should be handled explicitly.
For backwards compatibility I think deprecation warnings for a release cycle when either gex_only
is used or nothing is passed and there are multiple feature types present could work.
Moving the 10x reading functions had been discussed in: #1387
Thanks, @ivirshup... I 100% agree with your suggestions. Maybe it's already worth adding a warning to the read functions now in cases where feature_type
has multiple values.
@chris-rands agreed that we shouldn't set the default to False
. This will have multiple issues both for backward compatability and the cases you describe.
@DaneseAnna: this might be important for the new episcanpy function i proposed yesterday ;).
This might be a bit of a rant, and I'm aware there are some good arguments for the way things are... but I just wasted 4 hours of my life because I wasn't aware of the default
gex_only=True
insc.read_10x_h5()
.Just wanted to flag that if scanpy support for multimodality becomes a thing, then this default may need to change to prevent frustration. I think moving read/write functionality to AnnData was already discussed at some point. For multimodal support, this might become more important again. If this is moved, then
read_10x_h5
should probably not default togex_only=True
anymore.Would it already be worth either making
gex_only
a required input? For backward compatability it could also just trigger a logging warning for now. I do think that this is quite an important thing to let people know with more and more 10X Multiome data being generated now (and CITE-seq for that matter).