hubmapconsortium / ingest-validation-tools

HuBMAP data submission guidelines, and tools which check that submissions adhere to those guidelines.
MIT License
8 stars 18 forks source link

Trying to read "directory-schemas/None.yaml"... How is this different from bad-no-such-type fixture? #570

Closed icaoberg closed 3 years ago

icaoberg commented 3 years ago

In the file level-1.yaml the assay type is defined as scRNAseq-10xGenomics. However, in the test dataset from the data provider, it is defined as snRNAseq-10Xgenomics in the metadata.tsv file.

@pdblood and @jswelling which one is correct?

icaoberg commented 3 years ago

@mccalluc the error is the following

$ src/validate_submission.py --dataset_ignore_globs=\*.tsv --local_directory "$D"

/hive/users/hive/icaoberg/ingest-validation/lib64/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.3) or chardet (4.0.0) doesn't match a supported version!
  RequestsDependencyWarning)
Traceback (most recent call last):
  File "src/validate_submission.py", line 155, in <module>
    exit_status = main()
  File "src/validate_submission.py", line 147, in main
    errors = submission.get_errors()
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/submission.py", line 101, in get_errors
    tsv_errors = self._get_tsv_errors()
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/submission.py", line 154, in _get_tsv_errors
    self._get_single_tsv_external_errors(assay_type, path)
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/submission.py", line 186, in _get_single_tsv_external_errors
    assay_type, data_path)
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/submission.py", line 216, in _get_data_dir_errors
    assay_type, data_path, dataset_ignore_globs=self.dataset_ignore_globs)
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/validation_utils.py", line 30, in get_data_dir_errors
    schema = get_directory_schema(type)
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/schema_loader.py", line 26, in get_directory_schema
    schema = load_yaml(_directory_schemas_path / f'{directory_type}.yaml')
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/yaml_include_loader.py", line 19, in load_yaml
    expanded_text = _load_includes(path)
  File "/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/yaml_include_loader.py", line 24, in _load_includes
    text = path.read_text()
  File "/usr/lib64/python3.6/pathlib.py", line 1196, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/usr/lib64/python3.6/pathlib.py", line 1183, in open
    opener=self._opener)
  File "/usr/lib64/python3.6/pathlib.py", line 1037, in _opener
    return self._accessor.open(self, flags, mode)
  File "/usr/lib64/python3.6/pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)
FileNotFoundError: [Errno 2] No such file or directory: '/hive/users/hive/icaoberg/ingest-validation/ingest-validation-tools/src/ingest_validation_tools/directory-schemas/None.yaml'

maybe improve the error message? I mean, this is an edge case.

mccalluc commented 3 years ago

Two separate issues :

Retitling.

mccalluc commented 3 years ago

TSV: https://docs.google.com/spreadsheets/d/1tV_e-Oqhx_cTZno2zDnx2TP5HW_atuxd6lT5PSLzVtU/edit#gid=1828148049

jswelling commented 3 years ago

@mccalluc do you have everything you need to address the stack trace? This is my current blocker; I can debug if that would be helpful.

jswelling commented 3 years ago

Confirmed that it still exists at e16196b, which is master HEAD now.

mccalluc commented 3 years ago

Looking at it now...

mccalluc commented 3 years ago

After saving locally and retitling, I get the expected errors:

$ mkdir /tmp/fake-submission
$ mv ~/Downloads/UFLA_10x_SP-LY_Metadata_120420.tsv\ -\ UFTMC_10x_120420.tsv.tsv /tmp/fake-submission/ufla-10x-metadata.tsv
$ src/validate_submission.py --local_directory /tmp/fake-submission
Metadata TSV Errors:
  /tmp/fake-submission/ufla-10x-metadata.tsv (as scrnaseq):
    External:
      ? row 2, referencing /tmp/fake-submission/https:/app.globus.org/file-manager?origin_id=24c2ee95-146d-4513-a1b3-ac0bfdb7856f&origin_path=%2Fprotected%2FUniversity%20of%20Florida%20TMC%2F638799c2725a0c88ec7ee389cb98884f%2F
      : No such file or directory: /tmp/fake-submission/https:/app.globus.org/file-manager?origin_id=24c2ee95-146d-4513-a1b3-ac0bfdb7856f&origin_path=%2Fprotected%2FUniversity%20of%20Florida%20TMC%2F638799c2725a0c88ec7ee389cb98884f%2F
      ....

Rerunning with --dataset_ignore_globs=\*.tsv changes nothing.

If I rename to the orginal name, I get a different, shorter, error:

$ mv  /tmp/fake-submission/ufla-10x-metadata.tsv /tmp/fake-submission/UFLA_10x_SP-LY_Metadata_120420.tsv\ -\ UFTMC_10x_120420.tsv
$ src/validate_submission.py --local_directory /tmp/fake-submission
Metadata TSV Errors:
  Missing: There are no effective TSVs.
Reference Errors:
  No References:
  - UFLA_10x_SP-LY_Metadata_120420.tsv - UFTMC_10x_120420.tsv

Library versions have recently been upgraded, so do a fresh pip install.

If you still have problems, can you provide me the output of python --version and pip freeze and find $D (or whatever the submission directory is), and the operating system you're on?