Increase Upload Speed - Githubissues

DavidBerdik commented 5 years ago

Would it be possible to increase the upload and download speeds that can be obtained when using this project? I have noticed that download speeds are better than upload speeds, but are still rather slow.

stewartmcgown commented 5 years ago

No idea. Open to any suggestions.

On Thu, 3 Jan 2019, 23:33 David Berdik, notifications@github.com wrote:

Would it be possible to increase the upload and download speeds that can be obtained when using this project? I have noticed that download speeds are better than upload speeds, but are still rather slow.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stewartmcgown/uds/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFWlNAzjugXwByZ2U_LROOOog-TcpwJks5u_pM1gaJpZM4Zo72t .

DavidBerdik commented 5 years ago

The only suggestion that I have which has not been implemented as far as I can tell is batching requests to the Drive API. Perhaps a certain number of chunks (100?) could be encoded and then sent to Drive for processing on a separate thread while encoding of chunks continues on the main thread. Time-permitting, I will play with this idea. I'm not sure if time will permit though.

78Alpha commented 5 years ago

Not sure how great an idea it would be but...

Google has mentioned you can convert text documents to google drive format. Now sure if that would set it as "0 space used doc", but it would allow for files of up to 50 MB to be uploaded and converted.

With the right threading, or processing, have one set to encoding, one set to uploading, and one to conversion. However they would have to be synced up neatly, as I found that calling the API in multiple instances terminated upload.

I attempted to multiprocess the upload but when more than one "user" accesses anything it cuts connection, so it was playing duck duck blackout with itself until stopped. Since every drive has a minimum of 15 GB, could be set to upload up to 7.5 GB then set to convert.

Uploading a solid file would at least be faster, but again, not sure if it converts neatly.

DavidBerdik commented 5 years ago

@78Alpha From what I can tell from my admittedly brief research, converting an uploaded file to a Google Doc does produce a "0 space used doc."

stewartmcgown commented 5 years ago

I have been unable to convert even 8MB text files to Google Docs format. Have you had any verifiable experience with this?

DavidBerdik commented 5 years ago

I have experience doing it with Word Documents years ago but never txt files. From what I've read though it is supposed to be possible.

I suspect you all have seen this already, but I will post it anyway: https://support.google.com/drive/answer/37603?hl=en

stewartmcgown commented 5 years ago

No of course you can convert documents, that is exactly what my program does. But there is a technical limitation which says that Docs can have only 10 million characters. Trying to convert large text files to Google Docs format fails every time.

I'm still open to other speed improvement suggestions, but I don't think this is the way forward.

DavidBerdik commented 5 years ago

Unless I am misinterpreting this, conversions do not have that limit: https://support.google.com/drive/answer/37603?hl=en

stewartmcgown commented 5 years ago

I imagine that is to allow for word docs with images in them.

You can test the behaviour I'm talking about by creating a fake base64 txt file:

base64 /dev/urandom | head -c 40000000 > file.txt

and attempting to upload and convert it in your Google Drive. The error is the console is 'deadlineExceeded', which I assume means there is an internal time limit on how long a conversion can take on Google's servers.

DavidBerdik commented 5 years ago

Yeah I see what you mean. I can't even get conversion to work properly through the web interface.

I have not had a chance to test converting documents that have images in them, but assuming that it works, it may be worth looking in to modifying the project to do the following.

Generate "images" that contain slightly less than 50MB worth of data.
Add those "images" to a Word document.
Upload the Word document to Drive.
Convert to the native Docs format.
Delete the original.

What are you thoughts on this?

78Alpha commented 5 years ago

With the images way, wouldn't you be able to input random garbled data into a png wrapped file and just upload it to google photos? A multigigabyte photo may be odd but it could work.

78Alpha commented 5 years ago

After attempting a few things I found what works and what might not work. I made a file with text and turned it into a PNG file, not just changing the extension but hex editing it to have the header... This did not work well... it requires some other header manipulation, changing data chunks and adding to crc to each chunk... manually writing it did not give good results...

However, the method I did find is a long one but did work. I converted a txt file to PNG, by that I mean I made a picture that showed the words "I am Text!". Editing the file doesn't have those words in any way. Getting it back into text used OCR... so... that method works but you have to account for the OCR being able to read small characters and of course turning it from text to whatever file it is... I guess this is covered by base64? Turns even the headers into plain text, so it should be as easy as adding it to a file with the right extension afterwards. I'll have to test this out more, I haven't found a way to automate it as I don't know how the sites I'm using do what they do, I suspect AI and that is a large scope...

DavidBerdik commented 5 years ago

I am not so sure that trusting OCR with this is a good idea. One thing that might be worth looking in to regarding images though would be trying to pack data in a JPG.

https://twitter.com/David3141593/status/1057042085029822464?s=19

https://www.google.com/amp/s/www.theverge.com/platform/amp/2018/11/1/18051514/twitter-image-steganography-shakespeare-unzip-me

Update: Apparently the source code is available now. I might play with it this weekend if time permits. - https://twitter.com/David3141593/status/1057609354403287040?s=19

78Alpha commented 5 years ago

There's also the method of making several google accounts and uploading a part of a zip to each. The limit would be 5, as uds seems to only ever 1/5 total bandwidth on any network. Having each file with its own account wouldn't drop the connection like multiprocess uploading did.

I've made my own script that auto updates the ID whenever a file is deleted or the like, but I have ti add in the details manually unless I want to make a self evolving script. Instead of just saying "pull id" for 1 drive, it goes "pull my_picture" and pulls from each drive, or deletes from each or pushes and loads the ID into a shared json...

However, seeing as how David got a really nice setup on jpg zips it seems promising. I will test to see if it works on drive but drive is very picky on "altered images". Best of luck, great concept, and awesome execution.

Edit:

After testing it out I managed to upload one of those "zips in jpg" files to the unlimited storage. However it is limited to about 64 kilobytes per jpg...

DavidBerdik commented 5 years ago

You were testing using Google Photos, right? Did you try putting altered images in a Word document, uploading to drive, and then converting? Perhaps that is doable?

Also, this is off topic, but I want to point out that I am not the same David who wrote the "zips in jpg" thing. I wish I was though. 😊

78Alpha commented 5 years ago

My apologies for the misunderstanding. I'll be trying that next. Currently fiddling with Steghide to store things, but it needs a jpg large enough to hide the data and good lord, i'm trying to make an 80,000 x 80,000 jpg on a little laptop... 4K images only offer 1.6 MB space.

I'll edit this once I tested the word document.

DavidBerdik commented 5 years ago

Good luck to you! Unfortunately I have not really had any time to play with any of this. I've only been able to theorize over what might be possible. School work has kept me busy even over the weekend.

Another way to handle this could be to create a 50MB (probably slightly less) bitmap file and use that for storing data. If you want to hide data in a bitmap while retaining the image, you can use least significant bit steganography, but since there is really no incentive to retain the appearance of the unaltered image, there's really no reason why we can just overwrite the entire image with our bits and put the garbled-looking image in a document. Using MS Paint, I was able to generate a 256 Color Bitmap of 48.7MB by setting the dimensions of it to 7150px by 7150px. The question here is does Google do anything to bitmaps in Word Documents that are converted to the native Docs format?

In regards to generating Word documents with Python, here is the answer to that: https://python-docx.readthedocs.io/en/latest/

And no worries. I just want to make it clear that I am not trying to claim someone else's work as my own. I know what it feels like when someone does that and do not want to perpetuate it. 😊

Update: Here is the bitmap I created. Apparently GitHub does not take kindly to uploading 50MB bitmaps so I had to zip it. Demo Image.zip

78Alpha commented 5 years ago

I tested it out and it did not go well... Putting a 2 MB text file required a near 20 MB image. Attempting to put a larger amount of data required a bigger image but I ran into a completely different problem. It consumes a ton of memory just to add the 2 MB into the image, I have an average 8 GB, and it requires 7 GB + 1 GB Swap per image, and that is just jpeg... I tried doing it with a PNG, but the software available requires even more memory to do it with a PNG. Even though said PNG can hold more data, it requires significantly more memory. Where JPEG took 8 GB, PNG was demanding 10 to 12 GB, freezing the system and crashing. It requires the same memory to extract too so... even though I had a test file it was not happy about taking the file out of it.

I also tested the word document. Google converts all images to PNG format, destroying the data injected into them... But it did create a zero space docs file. To do it, you would need to have a PNG in the first place... However, the requirements for making the PNG are way too high to be useful... 1 Image needs 10 GB RAM, but can only inject around 500 KB data, the image created would also be larger than a JPEG... that is ration of 1:30, for JPEG it's 1:10, and for Stewart's UDS method it's 2:3. If you account for upload speed, images can upload at full speed but UDS is limited to 1/5 total network bandwidth, so only 5x performance can be gained from each image format. PNG > 5:30 ? 1:6, JPEG > 5:10 > 1:2, UDS (still 1/5) = 2:3, or 1: 1.5. His method is smaller and the fastest of the image methods. The image methods just allow for cloud syncing in such a way that you don't have to deal with an ID and can easily resume upload.

The only methods I can see would be having multiple drives and uploading to each via processing, so it doesn't close connection due to too many access attempts at the same time. Making offline word docs, uploading those and converting them (however you would have to delete the original word doc because it still takes up space).

And I used Steghide and OpenStego, if anyone is curious. Steghide is a command line tool while OpenStego is a GUI tool, also the only one of the two that can work with PNG files.

And about the uploading an image file with Garbled Data, I attempted that a while ago, giving the file all the headers necessary to appear to be a file, but Google, Twitter, etc. Require the file to be thumbnailed in order to prove it is an image and not, of course, what we are trying to do. That's why a cover image is used. Google photos refused to upload any image that it could not make a real thumbnail out of, but did work with ones that had real image data.

So... still at square one? UDS method is still the fastest available...

I have also found that using any steg tool has serious problems handing part files, a zip with multiple parts. The part file can be 1 KB but if the data inside is supposed to be larger than 2 MB, it just will not put the zip into the image because it thinks it is a larger file than it actually is...

DavidBerdik commented 5 years ago

I am not sure that this adds much, but after a little investigation, I believe that the bitmap issue with the Word Documents is actually not Google's fault, but rather, Microsoft's fault. Here's how I found that.

Created a new Word document and added my bitmap to it.
Saved the document, closed it, and changed the extension to zip. (docx files are not thing more than glorified zip archives)
Unzipped the file and found the image. It was in a PNG form.

I do have one more crazy idea. I am pretty sure that it is too impractical to be useful, but I will share it in the hopes that someone does find it valuable.

Create an empty Word document.
Create a new image file (the one that I am playing with is 7150px by 1px and the format should not matter).
Set each pixel in the bitmap to white or black to indicate the bit setting in the file that is being uploaded. (0 = white, 1 = black, or the other way around if you prefer)
Add the image to the Word document, save the Word document, and check the size.
Repeat steps 2-4 while Word document size is under a certain threshold.
Once size threshold is reached, upload document, convert to native Docs format, and delete original Word file.

I am of course aware of the drawbacks of treating each pixel as a bit instead of a byte, but using this method, I am not sure that each pixel could be reliably used as a byte. Since Word/Docs seems to like the PNG format, perhaps using bytes would be acceptable since we would not have to worry about what happens during conversion.

Does anyone have any thoughts on this? (Besides of course thinking that I am crazy.)

78Alpha commented 5 years ago

Recently tested multiprocessing in a range of ways... defined processes, self mutating processes, os processes, and star map... all of which got caught up on a certain part "Requires x docs to make", specifically. It seems to stop the system from spawning any new processes, or automatically calls join() on a process it doesn't know the name of. In just running multiple instances in different terminals it worked fine, at least for a small set of data. I once tried with very big files and it got caught up and just stopped both uploads. Using "os.system" to call UDS and whatever command I need to do also seems to cause a problem, it makes the group name "None" and for some reason that stops the whole thing, even when grouping does nothing at all... Trying to do it from an outside script... led to... VERY weird results. UDS started treating letters like files, it would fail to upload anything but encoded non-existent data and uploaded that with the name "."

I have run out of external ideas to speed it up... the only ways left would be to change parts of the core UDS, and that is way over my head.

And apparently there is a rate limit quota, so a single file can only edited so fast, I found this out when messing around with the concurrent.executor that was quoted out. And applying for extra quota requires asking google for more... So... My old idea of making multiple google accounts to access a file might be valid for multiplying speed, maybe I'll test that next...

Asqii commented 5 years ago

AFAIK most times Google will grant you more api calls without really asking any questions

On Thu, Feb 21, 2019 at 5:54 PM 78Alpha notifications@github.com wrote:

Recently tested multiprocessing in a range of ways... defined processes, self mutating processes, os processes, and star map... all of which got caught up on a certain part "Requires x docs to make", specifically. It seems to stop the system from spawning any new processes, or automatically calls join() on a process it doesn't know the name of. In just running multiple instances in different terminals it worked fine, at least for a small set of data. I once tried with very big files and it got caught up and just stopped both uploads. Using "os.system" to call UDS and whatever command I need to do also seems to cause a problem, it makes the group name "None" and for some reason that stops the whole thing, even when grouping does nothing at all... Trying to do it from an outside script... led to... VERY weird results. UDS started treating letters like files, it would fail to upload anything but encoded non-existent data and uploaded that with the name "."

I have run out of external ideas to speed it up... the only ways left would be to change parts of the core UDS, and that is way over my head.

And apparently there is a rate limit quota, so a single file can only edited so fast, I found this out when messing around with the concurrent.executor that was quoted out. And applying for extra quota requires asking google for more... So... My old idea of making multiple google accounts to access a file might be valid for multiplying speed, maybe I'll test that next...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stewartmcgown/uds/issues/19#issuecomment-465883824, or mute the thread https://github.com/notifications/unsubscribe-auth/AMdFMyB11fey2IRI-myyqrzNWYuV9-lyks5vPkKjgaJpZM4Zo72t .

DavidBerdik commented 5 years ago

@78Alpha Have you bothered playing around with my latest crazy suggestion at all? If not, I will have a look at it myself when time permits.

78Alpha commented 5 years ago

I am studying it at the moment. I haven't gotten to it as I personally wouldn't know how to execute it. From what I see, it would still be bound to the 700 KB doc size limit, but you would be able to group files, however, they wouldn't be able to be part files at the moment...

It allows for more organization but reduces the amount of storable data per picture. I'll have to work with BMP a little bit to see how it handles data.

Attempted the BMP you uploaded, it was apparently too short to hide a 5 MB file, but again managed to hide a 2 MB file. It is starting to appear that 2 MB is the limit for a single file.

I found that files can continuously be pumped into images. I added a zip into an image and made an image, then pumped data into that image... however, the time it takes to do so is exponential. The first time took 10 minutes, this second time is at 5.7% and has taken 2 hours already.

DavidBerdik commented 5 years ago

Regarding the BMP thing I suggested, why was that limit present? Can't you edit all the bytes outside of the header without corrupting the image?

Regarding the Doc size limit, I tested that.

Using an online PNG generator (https://onlinepngtools.com/generate-random-png), I generated a bunch of PNG images and placed those images in a word document ("Word PNG Image Test.docx") that was about 48MB in size. I uploaded the document to Google Drive and converted it. The conversion was successful. I then downloaded the converted file and checked its size ("Word to Google Doc Conversion Test.docx"). It was 28.1 MB. Using the .zip extension trick, I unzipped the two files to compare the images, and although both were in PNG format, the images were not technically the same. The ones in the Google Drive version were more compressed. I then tried creating a new Google Doc via the web UI and inserting all of the images from the original Word document as well as the converted document in to the new Google Doc. This worked, but it took a while for saving to complete. After this, I downloaded the Doc-created file ("Google Doc Manual Image Insertion Test.docx") which totaled 76.1 MB (note that this size is a sum of the previous two sizes). I then extracted this file using the zip trick and compared the hashes of the images to the hashes of the images from the documents they were sourced from and they all matched. So it looks like the best way to do this would be to insert the images directly in a Google Doc. Unfortunately I cannot find official documentation on what the maximum size is for a native Google Doc, but according to this obscure Google Products forum post, the limit is 250MB. The three documents I created during this test are attached in RAR archive fragments.

GitHub does not allow RAR archives to be uploaded so I had to change the extension to zip. To extract these, change all of the extensions back to .rar and use WinRAR to extract.

Sample Documents.part1.zip Sample Documents.part2.zip Sample Documents.part3.zip Sample Documents.part4.zip Sample Documents.part5.zip Sample Documents.part6.zip Sample Documents.part7.zip

78Alpha commented 5 years ago

I myself couldn't edit past the header bytes, it was far outside my field of expertise. I used one of the Kali linux tools, Steghide, and it attempts to inject data in such a way that it will work on sites that try to generate a preview. Since it pushes it to a singular block in the image, I assume that's the limit, if I could input data per block then it would only have a limit of block numbers instead of size (and if course ram when trying to open the image itself as a text document). That 250 limit seems very generous, I wonder who made a doc that big that it was placed so high. I'll have to learn more about all this, but as long as the data is in big chunks, it could boost upload to full potential. I'll take a few days to learn more, if I can't learn what is needed I might have ti pass off trying an implementation myself.

78Alpha commented 5 years ago

So, I looked into the whole thing a bit and learned a lot. When I first started out, I was using PNG images, and that's where I went wrong. PNG files are the hardest to work with, as they have checksums for each block, making it nearly impossible to inject data into them, knowing that is helpful though. PNG files have the largest potential size (I downloaded a 500 MB from the Nasa site), but working with them is slow, tedious, and not very efficient...

I worked extensively with your BMP files too, but with Google Photos related tests. After doing some reading up, unlimited storage is for files less than 16 MP (4920 x 3264). So I made a BMP of size 4920 x 3264 with a simple gradient. It is ~50 MB in size, much better than JPG, but not as good as PNG, however, it works. The BMP uploaded to google photos, takes no storage space, and could be downloaded and unzipped.

https://photos.app.goo.gl/RvRR7H4bhcwQcCRu5 (contains a 7zip file)

That is the picture, you can tell how full it is by the amount of random static in it, from bottom to top, so you can also add in more stuff if you want, it's a pain to find the end bytes but is possible. (Also, the data in there is a game of my own design, if it brings up any concern).

I attempted to copy the BMP bytes and create arbitrary images with python, as python has binascii to do stuff like that. However, when writing up a script it threw out a nonsensical error, I state that simply because I ran the same code from an interactive prompt and worked flawlessly, so automation will be problematic...

I also tested your DOC idea, I added a very small jpg to a word document, converted it to a zero space file with docs, and downloaded the doc with its images... And, well... It destroyed the data again. The image was only a 30 KB JPG, so it wasn't turned into a PNG, however, it still tampered with the data such that it couldn't be extracted (or be seen as an archive).

Part files are also not working in multiple images so... I'll be working with hex for a while...

DavidBerdik commented 5 years ago

Cool!

Over the weekend, I participated in a local hackathon with two friends (@SMyrick98 and @digicannon) and we tried to implement a prototype of the "bitmaps in Word documents" thing that I mentioned earlier. Unfortunately, we did not get everything working due to apparent inconsistencies in the bitmap standard, but time permitting, I believe we have plans to attempt to finish it. If that happens, I will share the work with their permission.

78Alpha commented 5 years ago

A rough snippet of code to help out...

def generic(): sequence = '1234567890ABCDEF' base = binascii.unhexlify('424DD2E4DE02000000007A0000006C00000037130000BF0C0000010018000000000058E4DE02130B0000130B0000000000000000000042475273000000000000') # Header bytes of a BMP file of size 16 MP with open("generic.bmp", 'wb') as byter: byter.write(base) temp = '' for x in range(10): for i in range(12000000): temp += str(random.choice(sequence)) # Generate random noise to increase file size #byter.write(secrets.choice(sequence)) byter.write(binascii.unhexlify(temp)) temp = '' gc.collect() byter.close()

The spacing is a github thing, it doesn't seem happy about sentences started with them...

The code generates a BMP of 60 MB, and yes, it is based solely on size. I used the header bytes to a BMP I had on hand, so it always has the same Width X Height and appears as a BMP. Although it is 60 MB, that's because google photos was not happy with the 240 MB generated one or the 120 MB... but different services should have different limits. In theory, you should be able to make a multi-gigabyte BMP file that always has the resolution of 16 MP. BM��zl7� X�� BGRs

is what is made from the bytes...

424DD2E4DE02000000007A0000006C00000037130000BF0C0000010018000000000058E4DE02130B0000130B0000000000000000000042475273000000000000

So... it could be modified to have part files in each image and then consolidated into a single BMP file, not sure how clean that would be, but it means each DOC could have a full zipped file even if images are limited in raw data size. However, from my testing, taking part files from images generates noise of a weird kind, it added data that never existed, corrupting the archives... I guess a cleaner way would be to add the part files to an image and close the file there, without extra noise, such that you can just ignore the headers and stitch the files into one big file.

Hopefully my blunders lead to discoveries for others.

jhdscript commented 5 years ago

I realize various tests:

Plain text
docx + convert
google sheet container

Speeds are always bad (250kbps max) so i think the only way to boost it is threading the process.

I look at rclone and with small files it works faster than all my test :'/

Moreover google limits file creation at 3 per seconds :-(

78Alpha commented 5 years ago

Here is a BMP tool, might make the process of putting them into docs easier, makes the standard more uniform. Sadly it's limited to 2.7 right now, 3.7 was having a cow about reading hex and bytes.

https://github.com/78Alpha/BMPMan

The only advantage google photos has is the fact it can make albums and continuously sync (at least from mobile). Still very manual, just makes pictures... I added in a license just for reasons, and read it over, so I guess I have to state this:

DavidBerdik, under the LGPL v3, you have free reign over the nightmarish code I have created in the link above, if you like.

jhdscript commented 5 years ago

and you are sure Google doesn t compress bmp ?

78Alpha commented 5 years ago

@jhdscript

I made sure to test on google photos, it didn't alter the file, it looks to see if the file is greater 16 MP, So the header states the file is (16 MP - 1), I could have made it 1x1 pixel, however, I like being able to tell how full the image file is by looking at it (less static means less it is an end file). Google docs, however, always compresses images, destroying stegonographic images or "Bogus" images. I'm sure there's a way and that's what David is going after.

I'm after Google Photos, Stewart did docs, David is doing Docs x Photos.

If I can find a video format that isn't as picky about headers, I can make a large bodus video file (since Google photos allows 10 GB videos at 1080p, I could make the video appear to be 1x1 if necessary).

Just to test all the photos stuff, I compressed a GOG game I had, packed it into BMP files (35 total), uploaded it, downloaded it, unpacked it, then checksumed it, extracted it, ran it, etc... It worked perfectly.

Depending on where the data is put, Docs vs Photos vs whatever else gets made next, it has different rules.

jhdscript commented 5 years ago

Mmm i generate few bmp files using a c# app, then uploaded to gdrive and redownload it. It seems original file have been altered.

Atm i dont read already your code but do you use a tricky bmp header or your header is a valid bmp header format?

78Alpha commented 5 years ago

I made a BMP using gip and just ripped the header from that, I think it still has the comment "Made with GIMP" in it, but I'll have to double check.

Edit:

Sadly it does not have the "made with gimp" comment

jhdscript commented 5 years ago

Bmp arevmodified by gdrive upload

78Alpha commented 5 years ago

How about google photos?

Strange that it isn't working for you, it worked when I tested it

jhdscript commented 5 years ago

What api do you use for uploading to google photos ?

The big problem with google photo is api queries is limited to 10000 per day

78Alpha commented 5 years ago

I haven't implemented any API use yet, I just copy the pictures to my phone and sync, at least for now.

At 10,000 per day, that is very good. ~470 GB per day, but if I find the cutoff image data size, then maybe it could get closer to 700 GB per day.

jhdscript commented 5 years ago

I dev it and upload 300gb on 6h. It seems but with chunks less than 20mb. Bmp was not compressed or retouched but google photos api is not very responsive and if google implement a compress algo on files we loss all.

Atm gdrive seems better but chunk size (700k) is the issue :-( it took more time to established conns than to upload

jhdscript commented 5 years ago

After lot of tests using external tool i determine that 10 thread is the optimal for posting without any http errors on gdrive.

Now I touch 500kbps, so still slow...

78Alpha commented 5 years ago

Working with my own thing, it manages to go at full speed, 1.5 MB/s. After looking into the API documentation, it looks like a trainwreck to get it working at full capacity without error...

So, my idea is out. However, I am still using it manually in backups, since it works. I can create albums, change the cover photo to match the data, and it's all in a nice package. Keep a UDS copy, a BMPMan copy, and a local copy.

Just for fun, I put the images into Gimp and made a GIF out of them. It's a 500 MB GIF that shows the representation of the all the data in the file, through random colored static.

DavidBerdik commented 5 years ago

Wow. It looks like I missed out on quite a lot here.

@78Alpha Unfortunately we have not made any additional progress on our hackathon project over the last week. At this point I will probably be taking over the project on my own and trying to finish it. Since the main problem we were having was generating bitmaps, I will probably try add your program to ours since it presumably works as intended.

@jhdscript Regarding your question about if Google does anything to bitmaps, what I found from testing my Google Drive idea was that when putting bitmaps either in Word documents directly or via Google Drive, the images become PNGs. Since PNG compression is lossless, I figure that the best way to handle this would be to generate a BMP, convert it to a PNG using Pillow, then pass it off to Word and Drive since apparently neither Word nor Google do anything to them if you give them PNGs from the start. And then when downloading use Pillow again to convert back to BMP.

nerkkolner commented 5 years ago

If anyone is interested, this dude @xhighway999 and his group of friends created this utility to store data as videos on Youtube https://github.com/xhighway999/PygaMasher

Maybe you guys can share some ideas together

DavidBerdik commented 5 years ago

That's pretty cool. I was just playing with the release build. It seems that the release build can generate the frames and store them as images properly but, at least on my machine, no video is being generated. This is probably the most sensible way to go about it though. Splitting data would still be necessary though in cases where either the resulting video is more than 12 hours long or the resulting video file is greater than 128GB (https://support.google.com/youtube/answer/71673?hl=en).

nerkkolner commented 5 years ago

Here is a BMP tool, might make the process of putting them into docs easier, makes the standard more uniform. Sadly it's limited to 2.7 right now, 3.7 was having a cow about reading hex and bytes.

https://github.com/78Alpha/BMPMan

The only advantage google photos has is the fact it can make albums and continuously sync (at least from mobile). Still very manual, just makes pictures... I added in a license just for reasons, and read it over, so I guess I have to state this:

DavidBerdik, under the LGPL v3, you have free reign over the nightmarish code I have created in the link above, if you like.

Hi, I saw your code and see there is a padding of 0*32 to the generated BMP, may I know what it is for? Also, I tried your tool and it seems that it doesn't convert to real image, but still can be uploaded to Google Photos, am I correct? Thanks

nerkkolner commented 5 years ago

That's pretty cool. I was just playing with the release build. It seems that the release build can generate the frames and store them as images properly but, at least on my machine, no video is being generated. This is probably the most sensible way to go about it though. Splitting data would still be necessary though in cases where either the resulting video is more than 12 hours long or the resulting video file is greater than 128GB (https://support.google.com/youtube/answer/71673?hl=en).

128GB is a rather large file size I would split it up too even if it's not the limit

nerkkolner commented 5 years ago

Working with my own thing, it manages to go at full speed, 1.5 MB/s. After looking into the API documentation, it looks like a trainwreck to get it working at full capacity without error...

So, my idea is out. However, I am still using it manually in backups, since it works. I can create albums, change the cover photo to match the data, and it's all in a nice package. Keep a UDS copy, a BMPMan copy, and a local copy.

Just for fun, I put the images into Gimp and made a GIF out of them. It's a 500 MB GIF that shows the representation of the all the data in the file, through random colored static.

Hi, it is me again I have finally made it working on a real picture with some ideas from your utility But one thing I get it right is the BMP header is just 54 bytes from I have read My end result is exactly like yours Now we just need a tool that can upload 16MP BMP in "high quality" and can be used in scripting/automation environment All the tools using Photos API can only uploads photos in original quality, which would count the drive space

DavidBerdik commented 5 years ago

128GB is a rather large file size I would split it up too even if it's not the limit "The maximum file size you can upload is 128GB or 12 hours, whichever is less."

I would try to go for 11:59:59 in less than 128GB. Even using HEVC that's probably not doable though.

Hi, it is me again I have finally made it working on a real picture with some ideas from your utility But one thing I get it right is the BMP header is just 54 bytes from I have read My end result is exactly like yours Now we just need a tool that can upload 16MP BMP in "high quality" and can be used in scripting/automation environment All the tools using Photos API can only uploads photos in original quality, which would count the drive space

Have you tried uploading PNGs then downloading them and converting back to bitmaps? Since PNG is lossless compression you should be able to go both ways and still retain the original data as long as the image is not resized. That is what we are trying for Google Drive. I still have not had a chance to glue in @78Alpha 's bitmap generator though.

78Alpha commented 5 years ago

I retested my method, and apparently the old 1.1.0 version had a weird bug. It created data and stored them in the images... But only files larger than buffer of 48 MB.

Made 1.4.0, changed everything, works out with large files now. No corruption, twice as fast writing, and my first GUI. If anyone is trying to make a GUI for UDS, I suggest PySimpleGUI, it's what I used, well, the tkinter version, was very straightforward.

Just on a side note, if you use the 1.4.0 version, make sure the directories it uses are empty beforehand, in a stop event it deletes the output folder...

I don't think my source is readable anymore.

With the API for an upload of images, it is not the greatest... The workaround I see is using a setting google offers in drive, where you click it and it converts everything to high quality. I have no clue if any of the APIs can interact with that or if it's just a user thing. It would only need to be pressed every 14 GB or so.

I also saw what you meant by the bitmap standard, I made a second one and the header was completely different, shorter too.

nerkkolner commented 5 years ago

Have you tried uploading PNGs then downloading them and converting back to bitmaps? Since PNG is lossless compression you should be able to go both ways and still retain the original data as long as the image is not resized. That is what we are trying for Google Drive. I still have not had a chance to glue in @78Alpha 's bitmap generator though.

No, because the maximum file size for 16MP PNG which I created on Photoshop is like 30MBs Like @78Alpha said, hacking PNGs is too complex and there are formats like BMP and TIFF that are easier to work with and also make bigger files Not to mention anything bigger than 16MP will be resized regardless of the file size, which in turn corrupts the original data

The thing I learned through playing with BMPs and Google Photos is that you certainly can upload bigger files than just 50MBs In my testing, it seems the biggest BMP file people can upload with "high quality" is 75MBs I did it with a 54 bytes BMP header and 74MB dummy file with all zeroes It uploaded OK but not with 75MB dummy file

Next I will play with ODT file to see if I can hack it to insert multiple pictures there and make it untouched by Google after upload Your tips is helpful, lets see where we will go from there

stewartmcgown / uds

Increase Upload Speed #19