Open aranega opened 4 days ago
Thank you for the info and the edits! I'll take a look at this soon. The real answer is to move away from floating point (internally) for naming files... there are inevitably rounding issues when searching for files.
Thanks @william-silversmith !
Unfortunately, I encountered another issue still using floating values for the resolution, so the modification I made are not enough unfortunately. This time, it's some spatial files that are not found. I have an image with this size: [1024,1440, 500]
and this resolution: [1.0448, 1.0448, 1.0448]
. The error I get is the following:
cloudvolume.exceptions.SpatialIndexGapError: mesh/1069.056-1403.136_1503.360-1870.848_522.000-935.424.spatial was not found.
I'm not sure it comes from a rouding issue this time as I don't have any file that come close from this that is generated. The "closest" file generated is 936.1408-1069.8752_1404.2112-1504.5120_468.0704-522.4000.spatial.gz
. I tried activating fill_missing=True
, but same result.
Unfortunately, it's the same issue, just in another module. :-/
That's really no luck :(. I'll try to take a little bit of time to debug this, but this time, I get the impression it's the generation of the file name list for the spacial file indexes that seems to generate too much files. Is it related to the same rounding issue in another place as you suggested? I saw that the file names that are generated consider at some point an expension of the bounding box. I didn't dig much more, but I was wondering if it could come from there?
I was able to overcome this situation at the moment by allowing missing spatial index, but, to be able to do it, I had to modify
locations_for_labels
this way (adding the allow_missing=True
parameter to file_locations_per_label
)
def locations_for_labels(cv: CloudVolume, labels: List[int]) -> Dict[int, List[str]]:
SPATIAL_EXT = re.compile(r"\.spatial$")
index_filenames = cv.mesh.spatial_index.file_locations_per_label(labels, allow_missing=True)
resolution = cv.meta.resolution(cv.mesh.meta.mip)
for label, locations in index_filenames.items():
for i, location in enumerate(locations):
bbx = Bbox.from_filename(re.sub(SPATIAL_EXT, "", location), dtype=resolution.dtype)
bbx /= resolution
index_filenames[label][i] = bbx.to_filename(1) + ".frags"
return index_filenames
and modifying also the file_locations_per_label_json
method from cloud-volume this way (passing the allow_missing
parameter that wasn't used to fetch_all_index_files
and skipping the files that have no content):
def file_locations_per_label_json(self, labels, allow_missing=False):
locations = defaultdict(list)
parser = simdjson.Parser()
if labels is not None:
labels = set(toiter(labels))
for index_files in self.fetch_all_index_files(allow_missing=allow_missing):
for filename, content in index_files.items():
if not content:
continue
index_labels = set(parser.parse(content).keys())
filename = os.path.basename(filename)
if labels is None:
for label in index_labels:
locations[int(label)].append(filename)
elif len(labels) > len(index_labels):
for label in index_labels:
if int(label) in labels:
locations[int(label)].append(filename)
else:
for label in labels:
if str(label) in index_labels:
locations[int(label)].append(filename)
return locations
It's going to take a while for me to be able to fix both floating point issues unfortunately.. I may have to deal with some backwards compatibility type concerns. My recommendation would be to round the resolution to an integer as painful as that may be... or at least round it to a value that is exactly representable in floating point.
Thanks for your feedback and your help on this!
I'll investigate upon the different images that we need to deal with and the resolution. Don't worry about the time it takes to fix it, with the current modification in cloud-volume and igneous, it seems to work (at least for the test I conducted). As we don't use igneous for another purpose at the moment, we don't need to be backward compatible, so we can live with those patches at the moment. I will intensely test with other images to see if all works without issue.
When trying to create a multi-resolution mesh sharded file from segmentation, if the resolution is a float, the collection of the different
.frags
fails with a message looking like this (where the numbers depends on the input segmentation parameters):The file that is actually generated by the
create_meshing_tasks
is448-896_0-448_0-448.frags
.After debugging a little bit, I found that when a float number is used for the resolution, there is some rounding that applies at some point that makes the final file name search wrong.
I modified the code this way: https://github.com/seung-lab/igneous/compare/master...aranega:igneous:master and it seems to work considering floating number resolution or int number resolution, but I'm unsure that those modification are enough, or if they will have some unexpected impact that I'm not seeing (the tests are running fine).
The code I'm using to generate the 3D multi-resolution mesh is classical and follows what's written in the
README
: