Open jkugler opened 10 years ago
If you need me to send the file, I can. Seems can't attach files to the issue.
Hmm, the only change that I can think of that might have some effect on this would be the removal of the fork call to fix https support.
Does this happen often? I haven't seen it and it didn't repro for me.
I'm working on a fix that will put the download back in a separate process so if that is the problem there is a fix coming.
If the problem is intermittent you should be able to fix the download by removing the .complete file.
----- Original Message -----
| From: "Joshua Kugler" notifications@github.com | To: "open-sw/pkg-cacher" pkg-cacher@noreply.github.com | Sent: Thursday, April 24, 2014 5:19:55 PM | Subject: Re: [pkg-cacher] file is getting corrupted during download (#28)
| If you need me to send the file, I can. Seems can't attach files to the | issue.
| — | Reply to this email directly or view it on GitHub .
It happened every time I tried it, and it was not using https, but straight http. Was your repro test via the stand-alone setup or via CGI? Via CGI would clear everything, no? Via stand alone, there might be a variable not being re-initialized (just throwing things out).
I'll try to repro it reliably.
OK, so I'm getting very weird results, and none of them terribly reliable. I ran this command on the server:
/etc/init.d/pkg-cacher stop ; rm -rf /var/www/cobbler/pkg-cacher/* ; /etc/init.d/pkg-cacher start
Then I ran these commands on a client:
wget http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml;
wget http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml.asc;
wget http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml.key;
wget http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml;
wget http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/primary.xml.gz
First time through, everything seems OK. But run them again, then again. Then again. Start looking at the contents of files, and comparing files that should be identical (in both size and content). I found two issues:
1) Files that were previously 404 now were showing up, but had the wrong content (the contents of other files). 2) Files that were fine before now had additional contents and headers. I'll paste a "subsequent" repomd.xml file below:
I am going to guess there is a variable (maybe more than one) not getting properly re-initialized when running in stand-alone mode.
<?xml version="1.0" encoding="UTF-8"?><repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
<revision>1397232742</revision>
<data type="filelists">
<checksum type="sha">08658af55a7a458d085e72d2d25ff4e160e475b3</checksum>
<open-checksum type="sha">f74d0145693ad5052b7442fc55d3006e5e4ced71</open-checksum>
<location href="repodata/filelists.xml.gz"/>
<timestamp>1397232742</timestamp>
<size>1186</size>
<open-size>6945</open-size>
</data>
<data type="other">
<checksum type="sha">8353f58833d1c0a4e81bc9f7d1789fb786b8d8a2</checksum>
<open-checksum type="sha">31240a4741c627989a48ddb9fda690c4755a6bba</open-checksum>
<location href="repodata/other.xml.gz"/>
<timestamp>1397232742</timestamp>
<size>11170</size>
<open-size>165605</open-size>
</data>
<data type="primary">
<checksum type="sha">bd52e8bf31f12a148dbd163e338580640f7c9998</checksum>
<open-checksum type="sha">a6df27fc6853bb121ded915c5e406a33f9344433</open-checksum>
<location href="repodata/primary.xml.gz"/>
<timestamp>1397232742</timestamp>
<size>1896</size>
<open-size>10910</open-size>
</data>
</repomd>
HTTP/1.1 200 OK
Server: Winstone Servlet Engine v0.9.10
Last-Modified: Fri, 11 Apr 2014 16:12:22 GMT
Expires: Fri, 11 Apr 2014 16:12:22 GMT
Accept-Ranges: bytes
Content-Type: application/xml
Content-Length: 1149
Connection: Keep-Alive
Date: Fri, 25 Apr 2014 07:50:28 GMT
X-Powered-By: Servlet/2.5 (Winstone/0.9.10)
<?xml version="1.0" encoding="UTF-8"?>
<repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
<revision>1397232742</revision>
<data type="filelists">
<checksum type="sha">08658af55a7a458d085e72d2d25ff4e160e475b3</checksum>
<open-checksum type="sha">f74d0145693ad5052b7442fc55d3006e5e4ced71</open-checksum>
<location href="repodata/filelists.xml.gz"/>
<timestamp>1397232742</timestamp>
<size>1186</size>
<open-size>6945</open-size>
</data>
<data type="other">
<checksum type="sha">8353f58833d1c0a4e81bc9f7d1789fb786b8d8a2</checksum>
<open-checksum type="sha">31240a4741c627989a48ddb9fda690c4755a6bba</open-checksum>
<location href="repodata/other.xml.gz"/>
<timestamp>1397232742</timestamp>
<size>11170</size>
<open-size>165605</open-size>
</data>
<data type="primary">
<checksum type="sha">bd52e8bf31f12a148dbd163e338580640f7c9998</checksum>
<open-checksum type="sha">a6df27fc6853bb121ded915c5e406a33f9344433</open-checksum>
<location href="repodata/primary.xml.gz"/>
<timestamp>1397232742</timestamp>
<size>1896</size>
<open-size>10910</open-size>
</data>
</repomd>
Ok I'll do some more testing with stand-alone.
----- Original Message -----
| From: "Joshua Kugler" notifications@github.com | To: "open-sw/pkg-cacher" pkg-cacher@noreply.github.com | Cc: "Robert Nelson" robertn@the-nelsons.org | Sent: Friday, April 25, 2014 12:56:46 AM | Subject: Re: [pkg-cacher] file is getting corrupted during download (#28)
| OK, so I'm getting very weird results, and none of them terribly reliable. I | ran this command on the server: | /etc/init.d/pkg-cacher stop ; rm -rf /var/www/cobbler/pkg-cacher/* ; | /etc/init.d/pkg-cacher start
| Then I ran these commands on a client: | wget | http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml; | wget | http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml.asc; | wget | http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml.key; | wget | http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/repomd.xml; | wget | http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/primary.xml.gz
| First time through, everything seems OK. But run them again, then again. Then | again. Start looking at the contents of files, and comparing files that | should be identical (in both size and content). I found two issues:
| 1) Files that were previously 404 now were showing up, but had the wrong | content (the contents of other files). | 2) Files that were fine before now had additional contents and headers. I'll | paste a "subsequent" repomd.xml file below:
| I am going to guess there is a variable (maybe more than one) not getting
| properly re-initialized when running in stand-alone mode.
| <?xml version="1.0" encoding="UTF-8"?><repomd
| xmlns="http://linux.duke.edu/metadata/repo"
| xmlns:rpm="http://linux.duke.edu/metadata/rpm">
|
| <?xml version="1.0" encoding="UTF-8"?>
| <repomd xmlns="http://linux.duke.edu/metadata/repo"
| xmlns:rpm="http://linux.duke.edu/metadata/rpm">
|
| — | Reply to this email directly or view it on GitHub .
Also (maybe related) wget is sometimes getting a 200 when it should get a 404, and then the file has content like this:
<html><head><title>Error 404</title></head><body bgcolor="#ffffff"><h1>Status Code: 404</h1>Exception: <br>Stacktrace: <pre>(none)</pre><br><hr size="1" width="90%"><i>Generated by Winstone Servlet Engine v0.9.10 at Fri May 02 14:27:25 PDT 2014</i></body></html>HTTP/1.1 404 Not Found
Server: Winstone Servlet Engine v0.9.10
Content-Length: 263
Connection: Keep-Alive
Content-Type: text/html;charset=ISO-8859-1
Date: Fri, 02 May 2014 21:27:25 GMT
X-Powered-By: Servlet/2.5 (Winstone/0.9.10)
<html><head><title>Error 404</title></head><body bgcolor="#ffffff"><h1>Status Code: 404</h1>Exception: <br>Stacktrace: <pre>(none)
</pre><br><hr size="1" width="90%"><i>Generated by Winstone Servlet Engine v0.9.10 at Fri May 02 14:27:25 PDT 2014</i></body></html>
Any progress on this?
I've come back to troubleshooting this, indirectly. It seems I am hitting this same problem (corrupted downloads) when using the zypper package manager in SLES. It uses, oddly enough libcurl. I'm wondering if there is actually a bug in libcurl that is getting triggered. Or maybe some buffer that is not getting properly emptied. I tired going through your code to extract a stripped-down test case, but don't know your code well enough to do so.
I'll look at alternatives.
----- Original Message -----
| From: "Joshua Kugler" notifications@github.com | To: "open-sw/pkg-cacher" pkg-cacher@noreply.github.com | Cc: "Robert Nelson" robertn@the-nelsons.org | Sent: Tuesday, June 17, 2014 2:47:56 PM | Subject: Re: [pkg-cacher] file is getting corrupted during download (#28)
| I've come back to troubleshooting this, indirectly. It seems I am hitting | this same problem (corrupted downloads) when using the zypper package | manager in SLES. It uses, oddly enough libcurl. I'm wondering if there is | actually a bug in libcurl that is getting triggered. Or maybe some buffer | that is not getting properly emptied. I tired going through your code to | extract a stripped-down test case, but don't know your code well enough to | do so.
| — | Reply to this email directly or view it on GitHub .
Well...don't take my word for it. :) I was just commenting that I had seen similar behavior on two apps that used libcurl.
So, this is shaping up to be VERY weird. I'm continuing to trouble shoot this with the SuSE folks. It turns out the the server is returning the corrupted file (XML followed by headers), but for some reason, we can't get it to exhibit the bug when using standard curl or wget, etc.
So, it does seem that curl somehow exercises the bug, but that the bug is in the server. I will keep you posted on this.
I have a weird one here. Config options of note:
daemon_port=9000 path_map = job http://build.whamcloud.com/job/
I try to download
http://10.1.0.10:9000/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/primary.xml.gz
and get this in the logs (I modified the request logic to pass the entire path to the access log writer):
Thu Apr 24 22:37:05 2014|23798|10.1.7.9|MISS|3374|/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/primary.xml.gz
Notice it says 3374 bytes. The file should be 1896 bytes, see here: http://build.whamcloud.com/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata/
When I got to look at the file, it is quite corrupted. On disk:
[root@mgmt repodata]# ls -l total 8
-rw-r--r-- 2 pkg-cacher pkg-cacher 3374 Apr 24 22:37 primary.xml.gz
-rw-r--r-- 2 pkg-cacher pkg-cacher 1149 Apr 24 22:36 repomd.xml
[root@mgmt repodata]# pwd /var/www/cobbler/pkg-cacher/packages/job/e2fsprogs-master/arch=x86_64,distro=sles11sp2/lastSuccessfulBuild/artifact/_topdir/RPMS/repodata
it is a combination of the repomd.xml file that was just downloaded previously, and then some headers:
HTTP/1.1 200 OK Server: Winstone Servlet Engine v0.9.10 Last-Modified: Fri, 11 Apr 2014 16:12:22 GMT Expires: Fri, 11 Apr 2014 16:12:22 GMT Accept-Ranges: bytes Content-Type: application/x-gzip Content-Length: 1896 Connection: Keep-Alive Date: Thu, 24 Apr 2014 22:36:55 GMT X-Powered-By: Servlet/2.5 (Winstone/0.9.10)
Then the GZIP contents of primary.xml.gz that I was expecting.
I think some data structure is not being flushed before the next request. Also, are the headers supposed to be in the cached file? They are not in the repomd.xml file that was just downloaded. Could you look into this, or direct me as to where I might look into it?