pepkit / pipestat

Pipeline results reporting package
https://pep.databio.org/pipestat/
BSD 2-Clause "Simplified" License
4 stars 2 forks source link

Dependency check causes pytest failures when running individual tests. #181

Closed donaldcampbelljr closed 3 months ago

donaldcampbelljr commented 3 months ago

Using a complex output schema:

title: An example Pipestat output schema
description: A pipeline that uses pipestat to report sample and project level results.
type: object
properties:
  pipeline_name: "default_pipeline_name"
  samples:
    type: object
    properties:
      collection_of_images:
        description: "This store collection of values or objects"
        type: array
        items:
          properties:
              prop1:
                description: "This is an example file"
                $ref: "#/$defs/file"
$defs:
  file:
    type: object
    object_type: file
    properties:
      path:
        type: string
      title:
        type: string
    required:
      - path
      - title

and attempting to report a value:


                "collection_of_images": [
                    {
                        "items": {
                            "properties": {
                                "prop1": {
                                    "properties": {
                                        "path": "pathstring",
                                        "title": "titlestring",
                                    }
                                }
                            }
                        }
                    }
                ]
            },

I see an error during testing:

        except e._NO_TRACEBACK as ex:
>           raise ex.with_traceback(None)
E           psycopg.errors.UndefinedColumn: column default_pipeline_name__sample.collection_of_images does not exist
E           LINE 1: ...itch_value, default_pipeline_name__sample.md5sum, default_pi...
E                                                                        ^

venv/lib/python3.10/site-packages/psycopg/cursor.py:732: UndefinedColumn

Double checking the db, it appears that column is indeed not being created during table creation.

donaldcampbelljr commented 3 months ago

We currently have the field definition as a list[dict]: image

It does show up during model generation: image

donaldcampbelljr commented 3 months ago

Ok, I think there might be something else going on here. If I remove the collection of images from my test and the recursive schema, I get the same error but for the next complex object, e.g output_file.

donaldcampbelljr commented 3 months ago

Oh, its our check to determine if the dependencies are satisfied. It creates an initial table using a different schema than the test: pipestat report --c 'tests/data/config.yaml' -i 'name_of_something' -v 'test_value' -r 'dependency_value'"

So, this causes issues with the very first test IF you are running them individually (and the schema in the test is not the same as the dependency test). I actually ran into this issue last week and did not realize/remember I had reverted the checks back to their original state.

donaldcampbelljr commented 3 months ago

I think my changing of the default schema in the testing config will alleviate this problem for now: https://github.com/pepkit/pipestat/commit/b13299f593df4caaab02b5cfc20fdb36d413964e