Closed dcl10 closed 1 year ago
Dear @dcl10 , previously WfExS was able to pull from most of the custom scenarios (for instance https://github.com/inab/WfExS-backend/blob/0878f9eef933f848ebfc78ce73118b40d39c7346/workflow_examples/wetlab2variations_execution_nxf_giturl.wfex.stage#L1 ). It is also true that it supports the pseudo "github" scheme (as you can see at https://github.com/inab/WfExS-backend/blob/0878f9eef933f848ebfc78ce73118b40d39c7346/workflow_examples/hello/hello_cwl.wfex.stage#L1) , from the suggestions of @stain in past BioHackEU2002.
I like some of the changes you have proposed through the pull request, like the test battery. But I cannot accept them because they can break other scenarios, and there are cases where the newly supported URLs are ambiguous.
For instance, ssh://git@github.com:inab/WfExS-backend
is not a standard ssh URI, because the colon is used to separate the host name from the SSH port declaration, as you can see at https://www.ietf.org/archive/id/draft-salowey-secsh-uri-00.html#anchor8 (and I would expect it is a "raw" repository, which no version control). So, the code should accept at most git+ssh://git@github.com/inab/WfExS-backend
, following the rules from https://pip.pypa.io/en/stable/topics/vcs-support/
The same happens to file:///inab/WfExS-backend/.git#subdirectory=workflow_examples/ipc/cosifer_test1_cwl.wfex.stage
, this should not be considered a git repository as the local directory should only be considered just that, a directory. The code should accept as a git repository git+file:///inab/WfExS-backend@0.1.2
, as git
command line accepts as cloning source other already pulled git repositories in the form of directories.
With your permission, I'm going to take the bits and ideas of implemented code from your pull request, and I'm going to integrate them in a separate branch, so you can revise them.
The only URLs I'm still allowing as repository links are https ones from repos, due historical reasons: bio.tools and WorkflowHub workflow contributors are used to tell their workflows through web URLs instead of more reliable permanent identifiers, as they are not used to them (and they are not accepted by command line tools like git)
@jmfernandez I take your point about some of the ambiguous URLs, such as ssh
URLs with a colon where that should be used to denote port, or pointing at the .git
subdirectory.
Surely wfexs should support anything that the git client would, with something to denote that the intent is the URL points at a git repo. Then the pip convention of git+<URL>
should suffice, and complying with the same behaviours as pip is desirable.
From our perspective, we just need a way to run a workflow from an airgapped environment. We do not have the luxury of pointing at github.com
which is obviously by far the most common usage, because we cannot access the public internet.
We had hoped that git+
URLs would allow us to checkout any git repository, whether locally on disk, or in a locally run service, e.g. GitLab running at http://localhost:8080
.
However we found that currently wfexs reports an error when attempting to checkout from those URLs, as an "unsupported http(s) repository source" or similar.
The intent (though perhaps not the result) of this PR is to resolve that issue, allowing working support for git+file://
, git+http://
, git+https://
and git+ssh://
URLs regardless of hostname (which currently appears to only allow github.com
and gitlab.com
, with comments suggesting the addition of bitbucket.org
in future...)
Can we work together to make that functionality work? Or otherwise make execution of a workflow from a local path e.g. file:///absolute/path/to/my/workflow.cwl
work?
We are blocked until we have working workflow execution from a workflow source that is not a TRS endpoint, or a supported public git service (github, gitlab...), but the changes included in this PR unblocks us.
Obviously appreciate your input as you know the codebase much better than us, and the ramifications of any changes beyond what we see as fixes!
example workflow URLs we tried which failed with errors prior to the changes in this branch looked as follows:
git+file:///absolute/path/to/repo#subdirectory=relative/path/to/workflow.cwl
git+http://localhost:8080/user/repo#subdirectory=relative/path/to/workflow.cwl
localhost:8080
was running an instance of GitLab.Hi, I'm going to merge your contributions in a new branch, so I can remediate some possible internal side effects
Description
Previously, WfExS could only pull from specific remote sources; namely github, bitbucket and gitlab. However, self-hosted git repos may required in some instances. This PR removes the specificity and treats git as git.
This PR removes the capability to use
raw.githubusercontent.com
URLs to guess the repo. This is github specific and inherently fragile, because branches with slashes (/
) in their names break this functionality. Use workflows as described by PyPI inworkflow_id
.It is still possible to pull remote files individually using these urls, though.