Open ctcisar opened 2 years ago
Would it be possible to not write the .json file and instead write all* the SpotFindingResults
data (including log info) into 1 netcdf file?
@ctcisar thanks for this! Making it easier to view intermediate results is super useful. I think @njmei may have also recently mentioned something like this being missing.
It would also be nice to have some test coverage here- something simple like https://github.com/spacetx/starfish/blob/master/starfish/core/types/test/test_decoded_spots.py that generates some data, saves it and reloads it using your new methods would be great.
@njmei I was thinking about doing something like that initially, but faced problems saving the SpotAttributes
together. Unless a DataArray
can contain items of type DataArray
, and it's possible to save the whole thing at once?
@berl I wrote something like this using our own data as I was testing this, I'll write up and push a unit test proper shortly.
@ctcisar Because the SpotAttributes
are pandas DataFrames (I think) under the hood, it should be possible to merge them into one DataFrame (with enough data to recreate individual SpotAttributes
later) then convert to xarray and save together with coordinate data as a netcdf
.
If I have some spare time this week, I'll see if can come up with an implementation that only produces/loads 1 file (probably netcdf
). Starfish is already pretty heavy in terms of all the metadata files it produces, it would be nice to have a solution that doesn't produce even more metadata-like files.
@berl I added a testing file in the same format as the provided test, hopefully it's up to spec.
In the process of writing this, I realised that alternating (obj type Log).encode()
and Log.decode()
will result in each result getting nested in an additional dict, structured as {"log": [data]}
. I do not know if this is intended behavior, but have fixed the loader to accommodate for this.
@ctcisar thanks for adding the tests. sorry about the delays in the CI tests- I started them up and the linter fails now. can you run that locally, make fixes and push here again? The CI should run automatically now
@berl I was having difficulty getting flake8
to run on my system, it doesn't like the way my python environment is set up. I addressed all the things mentioned in CI on my last commit.
The error I get when I run make all
locally (with the flake8
lines commented out of the Makefile) is the following:
AttributeError: 'DataArray' object has no attribute 'to_numpy'
But I'm not sure where this is coming from, because DataArray does have that. (notably, when I tried to run the xr
version of assert equal, it ran into errors with non-str keys. So I used this to work around it)
I am at a bit of a loss for the last flake8
error. It says
starfish/core/types/test/test_saving_spots.py:9:1: I100 Import statements are in the wrong order. 'from starfish.core.types import PerImageSliceSpotResults, SpotAttributes, SpotFindingResults' should be before 'from starfish.types import Axes, Coordinates, Features'
Even though the order it is describing is the order they are imported in. I have also tried the reverse order, and that didn't make a difference.
I have added two simple methods to save
SpotFindingResults
to disk and reload them. This is handled by calling the methods inphysical_coord_ranges
,log
, and_results
to save each object, and then writing ajson
file that identifies the indices and paths to each of these files. The loader opens thisjson
and uses the loading method from each of said objects to recreate the originalSpotFindingResults
object.This can be useful for storing, comparing, and troubleshooting intermediate results. Storing the contents of each
xarray
-like object as anetcdf
allows for compatibility across most package version differences, unlike saving the entireSpotFindingResults
to a Python pickle.