pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

Bugs in `is_registry_path` #456

Closed nsheff closed 3 months ago

nsheff commented 7 months ago

this is poorly done:

https://github.com/pepkit/looper/blob/1468956dde66abf5b853c80eaeaee2d411bfad64/looper/utils.py#L578-L595

I'm trying to pass it a path to a csv file which is a valid PEP, config/demo_fasta.csv, and it's interpreting it as a registry path!

This should prioritize if it's a file, and only if the file doesn't exist, then it should check to see if it could be a registry path.

This bug makes looper basically unusable unless you're using PEPhub.

nsheff commented 7 months ago

I've added a minimal reproducible example in hello_looper -- here: https://github.com/pepkit/hello_looper/

the csv example shows the behavior. This should actually be an example in hello looper to show a legitmate use case.

Maybe having that example would have prevented this bug. It should be integrated into some smoketests.

nsheff commented 7 months ago

the docs make it seem like you're supposed to say pephub:: to indicate a registry path, but this does not appear to be what is happening.

image

nsheff commented 7 months ago

Related to #426

donaldcampbelljr commented 6 months ago

I attempted a simple fix in the above draft PR, where the function is_registry_path would also return False if the path ended with .csv. I confirmed that this will go on to load a Project via: https://github.com/pepkit/looper/blob/19b81bb744b3ec01b5d85c7b67a9d2e8be8dd3d5/looper/cli_pydantic.py#L198-L213

However, Looper goes on to later call prj.name and it fails with the exception: Project name inference isn't supported on a project that lacks a config file.

nsheff commented 6 months ago

I attempted a simple fix in the above draft PR, where the function is_registry_path would also return False if the path ended with .csv. I confirmed that this will go on to load a Project via:

No! don't hack it more...

Can we not rely on the registry path functions we already wrote and have been using for years in ubiquerg?

nsheff commented 6 months ago

However, Looper goes on to later call prj.name and it fails with the exception: Project name inference isn't supported on a project that lacks a config file.

This would be a separate bug, then, because those are valid projects.

donaldcampbelljr commented 6 months ago

Ok, I investigated parse_registry_path mentioned in #426 from Ubiquerg. However, that function parses an input string ending with .yaml just fine (which is not a registry path, right?). In Looper, this parsed input string is handed back to RegistryPath from PEPHubClient which validates this parsed string without issue. image

Should we be checking the parsed subitem for common file extensions during validation? Should we check during parse_registry_path and raise an exception?

I also investigated is_registry_path from PEPhubclient but then realized its very similar to Looper's function and just checks for the file extension before calling: registry_path = RegistryPath(**parse_registry_path(input_string))

donaldcampbelljr commented 6 months ago

I meant to add this question: Is it desired that we should just check the provided config_file (basically move the check for .yaml or .csv out of that function) and if its not a file, then proceed to check if its a registry path. Essentially, reverse the check during project initialization: https://github.com/pepkit/looper/blob/19b81bb744b3ec01b5d85c7b67a9d2e8be8dd3d5/looper/cli_pydantic.py#L178-L213

donaldcampbelljr commented 6 months ago

After discussion:

donaldcampbelljr commented 6 months ago

Ah, interestingly, if I give the following string to parse_registry_path, it returns None. Example: '/tmp/tmp3j_05ulh/basic/project/project_config.yaml'

But during initial testing setup I found that this works fine: something/example.yaml

donaldcampbelljr commented 6 months ago

I've merged changes and marked this as completed. However, related issues https://github.com/pepkit/ubiquerg/issues/43 and https://github.com/pepkit/looper/issues/484 are still WIP.