Open dawe opened 6 months ago
From a quick look I think the reason is what you described. @lillux could you have a look at this please?
@dawe @LucaMarconato This has been solved in #139, that also describe how the parsing behavior has been changed. Now the reader gives priority the file of which the path has been specified, instead of prioritizing the pattern matching.
I have just tested the above scenario with spatialdata-io
version 0.1.3.dev117+g3c03009
, at commit 3c03009, and an Exception
is raised when multiple matching files are found.
Please @dawe let me know if you still see the problem with the latest version of spatialdata-io
.
This is in part to describe implementation details, in part to get opinions in improving it.
Just to clarify why we take index [0]
of _check_path()
output in the 2 calls below:
In the actual implementation, _check_path()
takes some arguments and return a tuple
:
https://github.com/scverse/spatialdata-io/blob/3c03009408da2eebfb6ddf12fafd8b93e87f0fda/src/spatialdata_io/readers/dbit.py#L26-L32
Index [0]
of the tuple
can be a path
or None
.\
Index [1]
is always a bool
that indicates that the path
has been resolved. This is needed for downstream tasks with optional arguments (optional_arg = True
).
Index [0]
is always a path
for mandatory arguments (optional_arg = False
), like the .h5ad
and barcode_list
, and an Error
is raised by _check_path()
if the path
is not resolved, or if multiple match are found when optional_arg = False
.
We do not care about index [1]
when optional_arg = False
, so we do not save its value when checking for .h5ad
and barcode_list
.
Instead we care about index [1]
when optional_arg = True
, for example in the case of an optional image
:
https://github.com/scverse/spatialdata-io/blob/3c03009408da2eebfb6ddf12fafd8b93e87f0fda/src/spatialdata_io/readers/dbit.py#L288-L294
Here we unpack the tuple
, and take in account both the values in the downstream tasks. When optional_arg = True
and multiple match are found, a warning
is printed and None
is returned instead of a path
, neglecting the optional argument while warning the user.
I am working with some data with many
.h5ad
files stored in the same directory. Whenever I create a newSpatialData
object I specify the path to the correctAnnData
counts file. I noticed that the file that is loaded is typically the wrong one and the first.h5ad
in the directory is loaded instead. I believe this is because the first element of a list is considered herehttps://github.com/scverse/spatialdata-io/blob/e3c53b5d6789bbaec49181f33dfe368170747ce4/src/spatialdata_io/readers/dbit.py#L282
I guess the same happens for the barcodes that are specified in the line below
https://github.com/scverse/spatialdata-io/blob/e3c53b5d6789bbaec49181f33dfe368170747ce4/src/spatialdata_io/readers/dbit.py#L285
The only workaround right now is to have multiple paths for different files.