mobie / mobie-utils-python

Python tools for MoBIE
MIT License
9 stars 5 forks source link

remote validation #109

Closed martinschorb closed 1 year ago

martinschorb commented 1 year ago

Hi,

I have a strange validation issue.

This project: https://github.com/mobie/environmental-dinoflagellate-vCLEM fails to validate with problems in the remote sources.

ValueError: Could not find valid data path in XML file  data/photosynthetic_dinoflagellate/images/bdv-n5-s3/Chloroplast.xml.

however, https://s3.embl.de/environmental-dinoflagellate-vclem/photosynthetic_dinoflagellate/images/bdv-n5/Chloroplast.n5/setup0/attributes.json and all other files exist and are publicly accessible on the S3.

Also, the project opens fine from remote in MoBIE.

Could this be an issue with the underscore in the s3 key?

constantinpape commented 1 year ago

This looks to me like it only validates the local data (and can't find it). You probably need to run mobie.validate_project -r 0 -d 1 ... to only check the remote data.

See

$ mobie.validate_project -h
usage: Validate MoBIE project metadata [-h] --input INPUT [--require_local_data REQUIRE_LOCAL_DATA]
                                       [--require_remote_data REQUIRE_REMOTE_DATA]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT, -i INPUT
                        the project location
  --require_local_data REQUIRE_LOCAL_DATA, -r REQUIRE_LOCAL_DATA
                        check that local data exists
  --require_remote_data REQUIRE_REMOTE_DATA, -d REQUIRE_REMOTE_DATA
                        check that remote data exists
martinschorb commented 1 year ago

mobie.validate_project -r 0 -d 1 works. r 1 causes it to fail. Also all local data files are present in that directory. I don´t understand why it looks for something local from the bdv-n5-s3 XMLs. They link to S3 no matter whether on local disk or on GitHub.

constantinpape commented 1 year ago

I don´t understand why it looks for something local from the bdv-n5-s3 XMLs. They link to S3 no matter whether on local disk or on GitHub.

With -r 1 it will also check the data in bdv-n5 and will fail if the corresponding data is not there. So this is the expected behavior.

martinschorb commented 1 year ago

That makes total sense.

However, in this particular case both local and remote data exist.

$ cat data/photosynthetic_dinoflagellate/images/bdv-n5/Chloroplast.n5/setup0/attributes.json
{"dataType":"uint8","downsamplingFactors":[[1,1,1],[2,2,2],[4,4,4],[8,8,8],[16,16,16],[32,32,32],[64,64,64]]}

plus it specifically complains about the S3 data when checking locally (that's the reason I wanted pybdv to show me the affected file).

ValueError: Could not find valid data path in XML file  data/photosynthetic_dinoflagellate/images/bdv-n5-s3/Chloroplast.xml.

Does that mean it cannot find the file it is pointing to? Or is it because the XML does not contain a path that pybdv understands (pointing to S3 instead of a local path)?

constantinpape commented 1 year ago

Ok, I see. Maybe something with the metadata is duplicated. I will check it out later.

constantinpape commented 1 year ago

I had a look at the metadata in the project and couldn't see any obvious issue. I also checked the https://github.com/mobie/covid-em-project , which has a similar set-up (local and remote data in bdv.n5 format), but couldn't reproduce the error in there; the validation works as expected.

Is the https://github.com/mobie/environmental-dinoflagellate-vCLEM (with local image data) somewhere on the EMBL share where I could access it?

martinschorb commented 1 year ago

check /g/schwab/Karel/mobie_cell1

constantinpape commented 1 year ago

Yep, there was an issue in one of the conditions in the validation, which caused the remote xmls to be validated by the function for local data.

constantinpape commented 1 year ago

Should work once you pull master.