Closed steshaw closed 1 year ago
I tried p7zip instead of unzip. Perhaps these errors will be illuminating:
$ nix-shell -p p7zip --command '7z e results.zip'
7-Zip [64] 17.05 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.05 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs x64)
Scanning the drive for archives:
1 file, 107224 bytes (105 KiB)
Extracting archive: results.zip
ERRORS:
Headers Error
Unconfirmed start of archive
WARNINGS:
There are data after the end of archive
--
Path = results.zip
Type = zip
ERRORS:
Headers Error
Unconfirmed start of archive
WARNINGS:
There are data after the end of archive
Physical Size = 304
Tail Size = 106920
Characteristics = Local
ERROR: Data Error : /optimization_run0/config.yml
Sub items Errors: 1
Archives with Errors: 1
Warnings: 1
Open Errors: 1
Sub items Errors: 1
I can confirm that something is broken with the zip file. When I run a simulation and download the zip file according to the instructions and then use unzip
(under NixOS) to unzip it, I get
Archive: results.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of results.zip or
results.zip.zip, and cannot find results.zip.ZIP, period.
Note that this is a different error than @steshaw gets, with the same unzip
version: in his case, "start of central directory not found", and in my case, "end-of-central-directory signature not found". This is all very mysterious :thinking: I'll check with a git blame
if something in the zipping code changed (I don't believe it did).
The zipping of the results happens in https://github.com/tweag/chainsail/blob/main/app/scheduler/chainsail/scheduler/tasks.py#L166 and hasn't been changed since #372. And that definitely worked at the time, after all, the beta release happened and I recorded the walkthrough video. And even that change didn't touch the actual zipping code, only a bit where the blob path gets set. So I guess this behavior is due to some dependency update.
This (or more specifically, the comment) looks interesting and possibly relevant: https://stackoverflow.com/a/2463819/1656472
Note that we are currently using the "w"
mode for writing into the BytesIO object via repeated zipfile.Zipfile.writestr
calls, and not the "a"
mode. Not sure it makes any difference - it did work before with "w"
.
It's strange that you get a different error to me, so I've attached an example results.zip
file in the main description above. When I unzip it, I get:
$ /usr/bin/unzip results.zip
Archive: results.zip
warning [results.zip]: 319 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [results.zip]: start of central directory not found;
zipfile corrupt.
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
I created a little example Python script which creates an in-memory zip file and then writes it out to a file. The resultant zip file unzips without problem 🤔.
https://gist.github.com/steshaw/f090e76dab4c60bd7e91272ac20e0a68
Using mode
of 'a'
or of 'w'
doesn't seem to make a difference. Both work.
It's strange that you get a different error to me, so I've attached an example
results.zip
file in the main description above.
Thanks for uploading that file! When I try to unzip it, I now get exactly the same error as you do. At least that's reassuring... Given that your example script works without issues, I'm wondering whether this might have something to do with the minio storage backend / Kubernetes setup. The minio version never changed, but what did likely change was the Terraform version via the nixpkgs bump) and, with https://github.com/tweag/chainsail/commit/2d26ae0ee4f9565917a2f3fd795eca3f63c0226a, the version of the Kubernetes provider. Maybe investigating along these lines might prove interesting :thinking: but really, I'm just guessing.
The Terraform version shouldn't affect the scheduler operating within k8s. Here my current plan:
I feel sure that we're missing something more obvious 🤷♂️.
The long and short of it is that I've found a fix and a PR is to follow! However, I've sketched the investigation below.
I checked that the test script works in the container. I check directly on the k8s scheduler pod by connecting with kubectl exec
.
I used the MinIO client, mc
, to connect to MinIO and cp the zip file. I did this inside the cluster on the scheduler pod. This retrieves an uncorrupted zip file. To repeat you need something like this:
curl -O https://gist.githubusercontent.com/steshaw/c8368cf04e19185e9a5f0825cf49b7e8/raw/ebcb41c1358d869ca4441dade1f360980b2a7abc/download-mc
bash download-mc
~/.local/bin/mc alias set myminio http://minio.default.svc.cluster.local:9000 chainsail chainsail
~/.local/bin/mc cp myminio/chainsail-samples/storage/10/results.zip .
unzip -d out results.zip # to check it
So, it seemed that there was nothing wrong with the zip file inside MinIO. I also checked access via port-fowarding.
On local machine, set up the port forward:
kubectl port-forward service/minio 8090:9000 # in one terminal
In a separate terminal:
curl --output - 'http://localhost:8090/chainsail-samples/storage/10/results.zip?AWSAccessKeyId=chainsail&Signature=ibYSALGPSprPiqNgRkQEpDgRfDc%3D&Expires=1681961986' >|r3zip
So curling works. Therefore there had to be a problem with our command from the docs. The corrupt file is longer, so additional bytes are getting inserted. There had to be something different about our command from the docs:
kubectl exec -it scheduler-worker-<tab complete here> -- \
curl --output - "<URL from download button>" > results.zip
I tried adding --silent
which didn't change the outcome. Finally, I wondered why are we using -it
. We don't need an interactive tty. Removing these options solves the problem. In fact, just removing -i
is sufficient.
Note, I imagine that your walkthrough video was done with Chainsail on GCP. This is a slightly different scenario where MinIO isn't used (the storage goes on GCS).
Wow :exploding_head: what a somewhat anticlimactic resolution to this issue, but excellent work tracking this down! Thanks a lot! :heart: I'm wondering how the -it
slipped into the docs. I mean, I obviously wrote it, but why :thinking: all I can think of is some thing that I was subconsciously remembering the docker exec -it some_image /bin/bash
incantation I have been typing a lot these last months.
And excellent point for the walkthrough video, which I shot indeed with a GCP deployment!
TIL also about the mc
MiniO client, thanks! :slightly_smiling_face:
I had a feeling that it was something simple that we'd overlooked. Don't sweat it. The little -it
looks pretty typical and innocent. I'll be wary of this in the future.
In local development with minikube, I'm finding that the
results.zip
that I get using the following command does not unzip:This command can be found in the docs here.
Here is an example URL that I used.
Example file: results.zip.
Here's what happens when I tried to unzip:
It seems to be a zip file:
My version of unzip is 6.00:
I found an article that suggested using
zip -FF
to fix broken zip files:Then I can unzip the
fixed.zip
file with some warnings: