jargij / googletransitdatafeed

Automatically exported from code.google.com/p/googletransitdatafeed
0 stars 0 forks source link

Unzipping cannot handle language encoding bit properly #379

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Run the attached gtfs-nonworking.zip against validator -> feedvalidator.exe 
gtfs-nonworking.zip

What is the expected output?
I expect to see properly generated summary of gtfs files

What do you see instead?
Crash of feed validator

What version of the product are you using? On what operating system?
transitfeed-windows-binary-1.2.12.zip
Windows 7 Professional 64bit

Please provide any additional information below.
We have been using GTFS Feed Validator to check the feeds produced by our 
system. We noticed couple of weeks ago, that validator started to crash with 
our data and last week I investigated it for a while.

To me it looks like your tool (or python?) might have some issues with 
unzipping the zip-files.

You can first check the output generated by your tool, it's attached on the 
text-file. You can reproduce this running your tool against the 
gtfs-nonworking.zip.

Then again, you can run validator succesfully against gtfs-working.zip. The 
data in both of these are exactly the same! And if you unzip the 
nonworking-zipfile and run validator against the generated folder, validator 
works without problems. If you use windows zipper and zip the folder again and 
run validator against new zip, it works again without issues.

What's the problem then with our original zip-file? I found out, that if the 
zip file's general purpose bit states language encoding, zip-file does not work 
with feed validator. 

nonworking:
    general purpose bit flag (0x0808) (bit 15..0):  0000.1000 0000.1000
      file security status  (bit 0):                not encrypted
      extended local header (bit 3):                yes
      UTF-8 names          (bit 11):                yes

working:
    general purpose bit flag (0x0008) (bit 15..0):  0000.0000 0000.1000
      file security status  (bit 0):                not encrypted
      extended local header (bit 3):                yes

Here's snippet of our java code generating gtfs zip-files. Only change when 
generating those attached zip-files was commenting/uncommenting line where 
language encoding is set to false.

    public void marshal(OutputStream output, Feed feed) throws IOException {
        ZipArchiveOutputStream zos = new ZipArchiveOutputStream(output);
//        zos.setUseLanguageEncodingFlag(false);

        try {
            zos.putArchiveEntry(new ZipArchiveEntry("stops.txt"));
            writers.getWriter("stops.txt").write(feed.getStops(), zos);
            zos.closeArchiveEntry();
...rest of files...
        } finally {
            zos.flush();
            zos.close();
        }
    }

Would it be possible to fix the handling of language encoding bit on Gtfs Feed 
Validator side?

Original issue reported on code.google.com by iiro.ka...@gmail.com on 8 Sep 2014 at 7:54

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by bdfer...@google.com on 9 Sep 2014 at 1:03

GoogleCodeExporter commented 9 years ago
Fixed in r1876.

Original comment by bdfer...@google.com on 9 Sep 2014 at 1:08