mobie / mobie-utils-python

Python tools for MoBIE
MIT License
9 stars 5 forks source link

remove_source does not remove records from dataset.json #120

Closed Buglakova closed 8 months ago

Buglakova commented 9 months ago

Hello,

I add an image source and then want to delete it before adding another image with the same source name. I remove source the following way:

dataset_metadata = mobie.metadata.read_dataset_metadata(dataset_folder)
sources = dataset_metadata["sources"]

if source_name in sources:
    print("Source already exists, remove before adding again")
    mobie.remove_source(dataset_folder, source_name, remove_data=True)

Then adding it with add_image fails with the following error:

Traceback (most recent call last):
  File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 88, in <module>
    main()
  File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 66, in main
    mobie.remove_source(dataset_folder, args.source_name, remove_data=True)
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/source_utils.py", line 97, in remove_source
    _remove_image_data(storage_type, os.path.join(dataset_folder, rel_path))
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/source_utils.py", line 13, in _remove_image_data
    rmtree(data_path) if storage_type.endswith("n5") else os.remove(data_path)
    ^^^^^^^^^^^^^^^^^
  File "/home/buglakov/miniconda3/envs/mobie/lib/python3.11/shutil.py", line 722, in rmtree
    onerror(os.lstat, path, sys.exc_info())
  File "/home/buglakov/miniconda3/envs/mobie/lib/python3.11/shutil.py", line 720, in rmtree
    orig_st = os.lstat(path, dir_fd=dir_fd)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/g/kreshuk/buglakova/projects/platy_registration/data/platybrowser-smfish-project/data/1.0.1/images/bdv-n5/prox-brn3-sfg_pl3_dapi.n5'

I figured that the dataset.json still has entries about this deleted source and if I go and remove them manually, I can add it again. What would be the correct way to completely delete a source?

Another problem I have is that after adding the source I have temporary directories with the name like "tmp_dataset_source" remaining in the directory where I was running a script. If I delete the source and information in dataset.json but this temporary directory is there, I can't add the image with the same name, failing with the following error:

Traceback (most recent call last):
  File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 88, in <module>
    main()
  File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 69, in main
    mobie.add_image(
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/image_data.py", line 246, in add_image
    metadata.add_source_to_dataset(dataset_folder, "image", image_name, image_metadata_path,
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 274, in add_source_to_dataset
    source_metadata = get_image_metadata(dataset_folder, image_metadata_path,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 192, in get_image_metadata
    return _get_image_metadata(dataset_folder, metadata_path, "image",
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 164, in _get_image_metadata
    file_format = _get_file_format(path) if file_format is None else file_format
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 153, in _get_file_format
    raise ValueError(f"{path} does not exist.")
ValueError: data/platybrowser-smfish-project/data/1.0.1/images/bdv-n5/prox-brn3-sfg_pl3_dapi.xml does not exist.

This is not such an issue, because I can just delete that directory after adding the source, but it's quite unexpected.

constantinpape commented 9 months ago

I figured that the dataset.json still has entries about this deleted source and if I go and remove them manually, I can add it again. What would be the correct way to completely delete a source?

remove_source will remove the entry from the dataset.json if it fully runs through. I just checked and the unittest covers this, so I am pretty sure this works. However, in your case remove_source fails with an error, so the function does not reach the point where the updated metadata is written out. You can see in the source code that this is the last thing that happens in the function.

The function fails because the folder 'g/kreshuk/buglakova/projects/platy_registration/data/platybrowser-smfish-project/data/1.0.1/images/bdv-n5/prox-brn3-sfg_pl3_dapi.n5' that it tries to remove (because of remove_data=True) is not there. Probably you have some inconsistent state because some previous failed attempt to remove the source. Can you try again with a source where you are sure that the image data is there? Then it should work.

This is not such an issue, because I can just delete that directory after adding the source, but it's quite unexpected.

Yes, that is true. You have to remove this tmp folder in order to re-add the source. I agree that it's a bit unexpected but this folder contains important debug information, so I don't want to remove it automatically.

Buglakova commented 8 months ago

I figured out that it was because I run it with Snakemake and the n5 file was the target of the rule. When the rule is rerun, Snakemake deletes the target file before running any commands. Thanks for help!