Closed FabianHofmann closed 1 month ago
It appears that there can be several causes that all result in the same MissingInputException
, it could be an authentication issue (that happened to me today), but I suspect that in this case it is the ?__blob=publicationFile
at the end that causes the issue. This URL for example seems to work just fine: http://wettelijkerente.net/wettelijkerente2.csv
ah no, I found the issue for your URL. snakemake
uses requests.head
to get some initial data from the file without downloading it in its entirety, but that returns an HTTP 303 status, which tells you to redirect elsewhere, but even following that redirect returns an HTTP 400 'Bad request'. So the assumption of snakemake
is that every HTTP server supports both the HEAD
and GET
HTTP verbs, but that is not the case on this server.
I think the best way to fix this would be to add a configuration flag on the storage provider supports_http_head
, which defaults to True, but can be set to False to use GET
also to query the metadata.
Alternatively, a allow_http_get_fallback
flag could be created instead, which defaults to False, but when set to True would fall back to GET on certain HTTP status codes. However, it might be quite tricky to get the correct set of status codes, because I think the error 400 would actually be a code on which you would not retry with GET. Therefore the supports_http_head
flag would seem to me to be a better approach. I will create a pull request to implement this shortly.
@johanneskoester is there a way to make the MissingInputException
give more feedback for remote files, because once a network is involved there are a lot of reasons for the file to (temporarily) not be found, even for a valid resource.
I am encountering an unexpected error when using the storage plugin. I have the following link which downloads a xlsx file from the destatis data base (https://www.destatis.de/DE/Home/_inhalt.html):
"https://www.destatis.de/EN/Themes/Economy/Prices/Publications/Downloads-Energy-Price-Trends/energy-price-trends-xlsx-5619002.xlsx?__blob=publicationFile"
The link has no redirects and works properly when running it in the browser or in
requests.get
. However, when using it within the storage function, like inthe workflow throws the following error:
I tried to understand what is going on, but could not resolve it. It seems to me like a bug, but perhaps I am missing a required setting.