Verify checksum after download

havardgulldahl commented 8 years ago

Via private email, the suggestion was raised that our tools could automatically verify the content after download, by comparing the md5 checksum from jottacloud and that of the newly downloaded file.

Sounds like a nice command line option for download() in cli.py.

This would be a nice way to get the know the codebase for beginners.

antonhagg commented 8 years ago

Without previous knowledge of Python and without a working installation this is what I came up with.

def download(argv=None):
    if argv is None:
        argv = sys.argv[1:]
    parser = argparse.ArgumentParser(description='Download a file from Jottacloud.')
    parser.add_argument('remotefile', help='The path to the file that you want to download')
    parser.add_argument('-l', '--loglevel', help='Logging level. Default: %(default)s.',
        choices=('debug', 'info', 'warning', 'error'), default='warning')
    parser.add_argument('-c', '--checksum', help='Verfy checksum of file after download')
    args = parse_args_and_apply_logging_level(parser, argv)
    jfs = JFS.JFS()
    root_folder = get_root_dir(jfs)
    path_to_object = posixpath.join(root_folder.path, args.remotefile)
    remote_file = jfs.getObject(path_to_object)
    total_size = remote_file.size
    with open(remote_file.name, 'wb') as fh:
        bytes_read = 0
        with ProgressBar(expected_size=total_size) as bar:
            for chunk_num, chunk in enumerate(remote_file.stream()):
                fh.write(chunk)
                bytes_read += len(chunk)
                bar.show(bytes_read)
    if args.checksum:
        md5 = JFS.calculate_md5(data)
        if md5 != JFSFile.md5:
            print ('''MD5 hashes don't match!''')
            answer = input('Continue: [y/n]')
            if not answer or answer[0].lower() != 'y':
            print('%s was not downloaded successfully' % args.remotefile')
            exit(1)
    print('%s downloaded successfully' % args.remotefile)

havardgulldahl commented 8 years ago

That's not that bad for something you wrote without knowing the language.

But you'll see some issues once you get your installation running (get it straight from github). So get that going, and then keep on coding :)

Here's some things I see immediately

You need to calculate the md5 of the file, not data.
You don't want to end up with a input() prompt. A red error message (look at clint) and exit(1) is enough.
Take a look at the argparse.ArgumentParser docs and see how you can use store_true to actually get True or False for free from argparse

antonhagg commented 8 years ago

So there are some progress, but got stuck at an error which I couldn't figure out how to fix (commenting out the for loop in calculate_md5 removes the error).

WARNING:py.warnings:c:\python27\lib\site-packages\jottalib\JFS.py:92: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  for data in iter(lambda: fileobject.read(size), u''):

Traceback (most recent call last):
  File "C:\Python27\Scripts\jotta-download-script.py", line 9, in <module>
    load_entry_point('jottalib==0.4.post1', 'console_scripts', 'jotta-download')()
  File "c:\python27\lib\site-packages\jottalib\cli.py", line 240, in download
    md5_lf = JFS.calculate_md5(lf)
  File "c:\python27\lib\site-packages\jottalib\JFS.py", line 93, in calculate_md5
    md5.update(data.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal not in range(128)

I'm not sure that the way I am accessing the local file are the best. I'm also struggling getting the md5 property from the remote file. Would be nice with a hint in the right direction. =)

if args.checksum:
   with open(remote_file.name) as lf:
       md5_lf = JFS.calculate_md5(lf)
       md5_jf = JFS.JFSFile.md5
   if md5_lf != md5_jf:

antonhagg commented 8 years ago

I think the first error is related to issue #79. Will see if the result there fixes the issue. Anyone that can give me a hand with getting the md5 from the remote file?

havardgulldahl commented 8 years ago

After you jfs.getObject(/path/to/file) and get a JFSFile object, look at JFSFile.md5, and in this case remote_file is already there, so:

md5_lf = JFS.calculate_md5(open(remote_file.name)) # because we've downloaded the file to remote_file.name
md5_jf = remote_file.md5

And take it from there. :+1:

antonhagg commented 8 years ago

I've been trying to get it to work but the checksum doesn't seem to be correct. Not sure this is a issue that's related to using it under windows or not but. Anyway, below is the code in Cli.py.

    with open(remote_file.name, 'wb') as fh:
        bytes_read = 0
        with ProgressBar(expected_size=total_size) as bar:
            for chunk_num, chunk in enumerate(remote_file.stream()):
                fh.write(chunk)
                bytes_read += len(chunk)
                bar.show(bytes_read)
        #if args.checksum:
        md5_lf = JFS.calculate_md5(open(remote_file.name, 'rb')) #opening in binary mode
        md5_jf = remote_file.md5
        print md5_lf
        print md5_jf
    print('%s downloaded successfully' % args.remotefile)

The checksum i get is:

C:\Users\XX>jotta-download jottacloud.pdf
[################################] 219340/219340 - 00:00:00
f8ceede2a2ac0c52f3e3bbeb25d3fa68
9fff650be9fd5a05d531730e4350af51
jottacloud.pdf downloaded successfully

Checking the file in an external md5 checker (http://onlinemd5.com/) gives the value of: 9FFF650BE9FD5A05D531730E4350AF51

Also doing a print data in JFS.py seems that it is missing out on the last rows. Have tried to figure out why this is but haven't found anything.

File content when opened in notepad:

obj
<</Length 3911/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <xmp:ModifyDate>2013-03-28T12:13:18+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2013-03-28T12:13:17+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2013-03-28T12:13:18+01:00</xmp:MetadataDate>
         <xmp:CreatorTool>Acrobat PDFMaker 11 for Word</xmp:CreatorTool>
         <xmpMM:DocumentID>uuid:b8e0d258-8375-49f3-8e23-f7de68210a4d</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:9625165b-c271-4ea6-9002-fea7e8500cf4</xmpMM:InstanceID>
         <xmpMM:subject>
            <rdf:Seq>
               <rdf:li>50</rdf:li>
            </rdf:Seq>
         </xmpMM:subject>
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:title>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:description>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>roland</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <pdf:Producer>Adobe PDF Library 11.0</pdf:Producer>
         <pdf:Keywords/>
         <pdfx:SourceModified>D:20130328111211</pdfx:SourceModified>
         <pdfx:Company/>
         <pdfx:Comments/>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

<?xpacket end="w"?>
endstream
endobj
20 0 obj
<</Filter/FlateDecode/First 6/Length 58/N 1/Type/ObjStm>>stream
hÞ240V0P°±ÑwÎ/Í+Q0Ö÷ÎL)Ž640Š)‚I«RYª˜žZlg` ~ím
endstream
endobj
21 0 obj
<</Filter/FlateDecode/First 6/Length 184/N 1/Type/ObjStm>>stream
hÞlÍA‚@†á¿²7• w4("Iº”t^Ý‰¶Ô‰i%ü÷Ñ¡Û{øx>Ð3¥Õjç½¿‡Lél¯©m±óðwÓ
c1ï¨+ŒÇ°X&R&H …ùDC uðY   •×L•ñj_lJsCV êL¬NÄr°Åá)1”dÿ‰‹¯¸g²}BZªpÕÎUlxsª£ø@=×(Ž;;´¿2è«+Ö^ÎŽÎ7FYö` ¯dIÍ
endstream
endobj
22 0 obj
<</DecodeParms<</Columns 5/Predictor 12>>/Filter/FlateDecode/ID[<D73E8F7CBFE3364DAC1DA07F06F81058><465B159AE6F857409B69D6D4AB883CAB>]/Info 104 0 R/Length 119/Root 106 0 R/Size 105/Type/XRef/W[1 3 1]>>stream
hÞbb &FÆ†CL@†?ˆd©‘<f ’QH2þš–µ ‘ÌÁâÙ ’ÓÌþ &çH_°,“%Xå:^?›¡,n"Ùþ€Hþ©`]ÓÁ¤Ð
Wî«d“ŒØIÆ?ødGÉÁL2m‡Ä/@€ gõê
endstream
endobj
startxref
116
%%EOF

File content when doing print data in calculate_md5 in the JFS.py file:

obj
<</Length 3911/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
            xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/">
         <xmp:ModifyDate>2013-03-28T12:13:18+01:00</xmp:ModifyDate>
         <xmp:CreateDate>2013-03-28T12:13:17+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2013-03-28T12:13:18+01:00</xmp:MetadataDate>
         <xmp:CreatorTool>Acrobat PDFMaker 11 for Word</xmp:CreatorTool>
         <xmpMM:DocumentID>uuid:b8e0d258-8375-49f3-8e23-f7de68210a4d</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:9625165b-c271-4ea6-9002-fea7e8500cf4</xmpMM:InstanceID>
         <xmpMM:subject>
            <rdf:Seq>
               <rdf:li>50</rdf:li>
            </rdf:Seq>
         </xmpMM:subject>
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:title>
         <dc:description>
            <rdf:Alt>
               <rdf:li xml:lang="x-default"/>
            </rdf:Alt>
         </dc:description>
         <dc:creator>
            <rdf:Seq>
               <rdf:li>roland</rdf:li>
            </rdf:Seq>
         </dc:creator>
         <pdf:Producer>Adobe PDF Library 11.0</pdf:Producer>
         <pdf:Keywords/>
         <pdfx:SourceModified>D:20130328111211</pdfx:SourceModified>
         <pdfx:Company/>
         <pdfx:Comments/>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

antonhagg commented 8 years ago

Sorry, got it to work! Was having one indent to much so it was missing out on the last chunk. =)

How do I go forward and suggest the new code (first time I use github)?

antonhagg commented 8 years ago

I think I need to rewrite some of the code that was proposed in the version i submitted since there has been quite a lot of changes and fixes since I wrote the code in the first place. Any help is appriciated.

havardgulldahl / jottalib

Verify checksum after download #73