microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

Invalid files in data_object_set #815

Open shreddd opened 1 month ago

shreddd commented 1 month ago

Is there an existing issue for this?

Current Behavior

I'm trying to re-link the Globus files to capture the current data in NMDC, and we found a bunch of files in Mongo that don't seem to exist on the filesystem.

If you look on CFS in: /global/homes/n/nmdcda/infra-admin/globus/check_files.log There are a number of missing files and dirs (look for ERROR or WARN).

We should try and figure out if

This is what we get from running https://github.com/microbiomedata/infra-admin/blob/main/globus/inspect_live_directories.py which queries all records in data_object_set and extracts URLs from them.

Expected Behavior

No response

Steps To Reproduce

This is what we get from running https://github.com/microbiomedata/infra-admin/blob/main/globus/inspect_live_directories.py which queries all records in data_object_set and extracts URLs from them.

Output in: /global/homes/n/nmdcda/infra-admin/globus/check_files.log

Environment

No response

Anything else?

No response

aclum commented 1 month ago

@shreddd How long does it take to run inspect_live_directories.py?

aclum commented 1 day ago

How exactly was this run? When i run with python inspect_live_directories.py check_files_and_directories --base-api-url https://data.microbiomedata.org/ --base-data-directory /global/cfs/cdirs/m3408/results I get requests.exceptions.JSONDecodeError