pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

looper is not validating inputs correctly #523

Open nsheff opened 2 months ago

nsheff commented 2 months ago

If I say that an input attribute is required in the input schema, then if I don't specify a value for that attribute in the CSV file, I expect looper to not submit the job.

But what happens is, looper still submits the job.

I am putting an example in the hello_looper repo.

donaldcampbelljr commented 1 month ago

Confirming: the usa sample fails when running the hello_looper example input_schema_example because there is no data at the file_path. However, if I add the data, the usa sample should still fail because there is no area_type provided in the sample table, but it passes and looper submits the job without issue.

During sample validation, sample_schema_dict looks appropriate: image

But the usa sample.to_dict() has an empty string for area type: image

And this appears to allow the sample to pass without issue during eido's _validate_object call: validator.is_valid(obj)

Deleting the area_type key from the obj and re-running is_valid does cause validation failure as expected.

It appears that peppy's sample.to_dict() should remove keys if values are empty? Or perhaps eido should preprocess the sample dict to remove empty items before validation?

nsheff commented 1 month ago

just add minLength:1 to the schema.

having an attribute present is not the same thing as having something in the attribute.

donaldcampbelljr commented 1 month ago