seung-lab / igneous

Scalable Neuroglancer compatible Downsampling, Meshing, Skeletonizing, Contrast Normalization, Transfers and more.
GNU General Public License v3.0
40 stars 16 forks source link

Multi-resolution mesh creation fails with resolution as float value #177

Open aranega opened 4 days ago

aranega commented 4 days ago

When trying to create a multi-resolution mesh sharded file from segmentation, if the resolution is a float, the collection of the different .frags fails with a message looking like this (where the numbers depends on the input segmentation parameters):

No such file or directory:  'mesh/447-895_0-447_0-447.frags'

The file that is actually generated by the create_meshing_tasks is 448-896_0-448_0-448.frags.

After debugging a little bit, I found that when a float number is used for the resolution, there is some rounding that applies at some point that makes the final file name search wrong.

I modified the code this way: https://github.com/seung-lab/igneous/compare/master...aranega:igneous:master and it seems to work considering floating number resolution or int number resolution, but I'm unsure that those modification are enough, or if they will have some unexpected impact that I'm not seeing (the tests are running fine).

The code I'm using to generate the 3D multi-resolution mesh is classical and follows what's written in the README:

from taskqueue import LocalTaskQueue
import igneous.task_creation as tc

mesh_dir = "mesh"

cloudpath = 'precomputed://file://./data/101'

tq = LocalTaskQueue()

tasks = tc.create_meshing_tasks(cloudpath, mip=0, mesh_dir=mesh_dir, sharded=True, spatial_index=True, compress=None)
tq.insert(tasks)
tq.execute()

tasks = tc.create_mesh_manifest_tasks(cloudpath, mesh_dir=mesh_dir, magnitude=3)
tq.insert(tasks)
tq.execute()

tasks = tc.create_sharded_multires_mesh_tasks(cloudpath, mesh_dir=mesh_dir, num_lod=1)
tq.insert(tasks)
tq.execute()
william-silversmith commented 3 days ago

Thank you for the info and the edits! I'll take a look at this soon. The real answer is to move away from floating point (internally) for naming files... there are inevitably rounding issues when searching for files.

aranega commented 2 days ago

Thanks @william-silversmith ! Unfortunately, I encountered another issue still using floating values for the resolution, so the modification I made are not enough unfortunately. This time, it's some spatial files that are not found. I have an image with this size: [1024,1440, 500] and this resolution: [1.0448, 1.0448, 1.0448]. The error I get is the following:

cloudvolume.exceptions.SpatialIndexGapError: mesh/1069.056-1403.136_1503.360-1870.848_522.000-935.424.spatial was not found.

I'm not sure it comes from a rouding issue this time as I don't have any file that come close from this that is generated. The "closest" file generated is 936.1408-1069.8752_1404.2112-1504.5120_468.0704-522.4000.spatial.gz. I tried activating fill_missing=True, but same result.

william-silversmith commented 2 days ago

Unfortunately, it's the same issue, just in another module. :-/

aranega commented 2 days ago

That's really no luck :(. I'll try to take a little bit of time to debug this, but this time, I get the impression it's the generation of the file name list for the spacial file indexes that seems to generate too much files. Is it related to the same rounding issue in another place as you suggested? I saw that the file names that are generated consider at some point an expension of the bounding box. I didn't dig much more, but I was wondering if it could come from there?

I was able to overcome this situation at the moment by allowing missing spatial index, but, to be able to do it, I had to modify locations_for_labels this way (adding the allow_missing=True parameter to file_locations_per_label)

def locations_for_labels(cv: CloudVolume, labels: List[int]) -> Dict[int, List[str]]:

    SPATIAL_EXT = re.compile(r"\.spatial$")
    index_filenames = cv.mesh.spatial_index.file_locations_per_label(labels, allow_missing=True)
    resolution = cv.meta.resolution(cv.mesh.meta.mip)
    for label, locations in index_filenames.items():
        for i, location in enumerate(locations):
            bbx = Bbox.from_filename(re.sub(SPATIAL_EXT, "", location), dtype=resolution.dtype)
            bbx /= resolution

            index_filenames[label][i] = bbx.to_filename(1) + ".frags"
    return index_filenames

and modifying also the file_locations_per_label_json method from cloud-volume this way (passing the allow_missing parameter that wasn't used to fetch_all_index_files and skipping the files that have no content):

def file_locations_per_label_json(self, labels, allow_missing=False):
    locations = defaultdict(list)
    parser = simdjson.Parser()

    if labels is not None:
        labels = set(toiter(labels))

    for index_files in self.fetch_all_index_files(allow_missing=allow_missing):
        for filename, content in index_files.items():
            if not content:
                continue
            index_labels = set(parser.parse(content).keys())
            filename = os.path.basename(filename)

            if labels is None:
                for label in index_labels:
                    locations[int(label)].append(filename)
            elif len(labels) > len(index_labels):
                for label in index_labels:
                    if int(label) in labels:
                        locations[int(label)].append(filename)
            else:
                for label in labels:
                    if str(label) in index_labels:
                        locations[int(label)].append(filename)

    return locations
william-silversmith commented 2 days ago

It's going to take a while for me to be able to fix both floating point issues unfortunately.. I may have to deal with some backwards compatibility type concerns. My recommendation would be to round the resolution to an integer as painful as that may be... or at least round it to a value that is exactly representable in floating point.

aranega commented 2 days ago

Thanks for your feedback and your help on this!

I'll investigate upon the different images that we need to deal with and the resolution. Don't worry about the time it takes to fix it, with the current modification in cloud-volume and igneous, it seems to work (at least for the test I conducted). As we don't use igneous for another purpose at the moment, we don't need to be backward compatible, so we can live with those patches at the moment. I will intensely test with other images to see if all works without issue.