Open jakecminihan opened 2 years ago
Aha the missing documentation π !
youβre right on track!
rmapy.document.ZipDocument
has a dump()
function , to dump the contents into a zipfile.
In your example you could do something like this:
downloaded_file = rm.download(Doc)
downloaded_file.dump("/where/to/store/document.zip")
After youβre done encrypting,
you can follow the guide to upload it back again here but should be something like this:
from rmapy.document import ZipDocument
from rmapy.api import Client
# loads the zip back into a ZipDocument class
to_upload = ZipDocument(file="/where/to/store/document_encrypted.zip")
rm = Client()
rm.renew_token()
rm.upload(to_upload)
let me know if you need anymore pointers!
Thank you for replying π I'm now 90% of the way there, but I get a really weird error when using ZipDocument
:
Traceback (most recent call last):
File "test2.py", line 112, in <module>
to_upload = ZipDocument(file= (zip_path + ".zip"))
File "/home/jake/.local/lib/python3.6/site-packages/rmapy/document.py", line 247, in __init__
self.load(file)
File "/home/jake/.local/lib/python3.6/site-packages/rmapy/document.py", line 329, in load
with zf.open(f"{self.ID}.content", 'r') as content:
File "/usr/lib/python3.6/zipfile.py", line 1375, in open
zinfo = self.getinfo(name)
File "/usr/lib/python3.6/zipfile.py", line 1304, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'bfef0d76-c4ce-4ed3-8912-a4fcf2bf5fe2.content' in the archive"
The ID of the "There is no item named" changes each time, and I can't work out what this error is and why it's happening! The dumped zip and encrypted zip have the same file structure (a .content, .pagedata, .pdf then a subfolder with 3 .rm's and .json's) and same ID's. I'm sure it's something I've overlooked, but no idea what it is!
I think the zip file needs the same uuid as the files in the zip content.
So:
Op 12 feb. 2022 om 14:20 heeft jakecminihan @.***> het volgende geschreven:
ο»Ώ Thank you for replying π I'm now 90% of the way there, but I get a really weird error when using ZipDocument:
Traceback (most recent call last): File "test2.py", line 112, in
to_upload = ZipDocument(file= (zip_path + ".zip")) File "/home/jake/.local/lib/python3.6/site-packages/rmapy/document.py", line 247, in init self.load(file) File "/home/jake/.local/lib/python3.6/site-packages/rmapy/document.py", line 329, in load with zf.open(f"{self.ID}.content", 'r') as content: File "/usr/lib/python3.6/zipfile.py", line 1375, in open zinfo = self.getinfo(name) File "/usr/lib/python3.6/zipfile.py", line 1304, in getinfo 'There is no item named %r in the archive' % name) KeyError: "There is no item named 'bfef0d76-c4ce-4ed3-8912-a4fcf2bf5fe2.content' in the archive" The ID of the "There is no item named" changes each time, and I can't work out what this error is and why it's happening! The dumped zip and encrypted zip have the same file structure (a .content, .pagedata, .pdf then a subfolder with 3 .rm's and .json's) and same ID's. I'm sure it's something I've overlooked, but no idea what it is! β Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
I've just tried implementing that, and it still won't work! It would appear that the UUID is changing from the downloaded document? Here is my code - apologies if it's not great, I'm more than happy to elaborate on any sections that aren't clear! For reference, I'm using a dummy pdf downloaded from https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf.
# Import dependencies
from curses import meta
from rmapy.api import Client, ZipDocument
import zipfile
import os
from os.path import basename
import glob
from PyPDF2 import PdfFileWriter, PdfFileReader
from tqdm import tqdm
# Complete startup checks
rm = Client()
rm.renew_token()
# Obtain data from specific notebook (todo: work out how to pull multiple files from folder)
books = [ i for i in rm.get_meta_items() if i.VissibleName == "dummy" ][0]
metadata = books.to_dict()
#print(metadata)
doc_id = metadata['ID']
print("The UUID is:", doc_id)
name = books.VissibleName
# Download doc(s)
downloaded_file = rm.download(books)
# Delete old file now it's not needed any more
#rm.delete(books) # <- this is not undoable!!!!!
# Unzip to PC
# Set up folder paths
download_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/RM Zips/" + doc_id + ".zip"
extraction_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + doc_id
# Dump to PC and extract
downloaded_file.dump(download_path)
with zipfile.ZipFile(download_path, 'r') as zip_ref:
zip_ref.extractall(extraction_path)
# Find PDF
os.chdir(extraction_path)
#print(os.getcwd())
# Create function to encrypt PDFs
def encrypt_pdfs():
# Create array of PDFs in dir & number
pdfs = glob.glob("*.pdf")
number = len(glob.glob("*.pdf"))
# Iterate over all found pdfs
for i in tqdm(range(number)):
# Load file as pdf
file = pdfs[i]
file = PdfFileReader(file)
# Is the file already encrypted? If so, do nothing. Could optimise by checking before loop and removing entries that are?
if file.isEncrypted == True:
pass
# Because this plugin is weirdly made, I have to technically create a *new* PDF. Probably not very efficient!
else:
# Create new output PDF
output = PdfFileWriter()
# Determine no. of pages in file
num = file.numPages
# Iterate over all pages of the PDF and append to a new document
for index in range(num):
page = file.getPage(index)
output.addPage(page)
# Encrypt the output with a password
output.encrypt("PlaceHolderPassword")
# Write document to file
with open(pdfs[i], "wb") as f:
output.write(f)
# Run new function
encrypt_pdfs()
# Create new zip output folder directory
zip_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/Zipped Files/" + doc_id
def zipfolder(foldername, target_dir):
zipobj = zipfile.ZipFile(foldername + '.zip', 'w')
rootlen = len(target_dir)
for base, dirs, files in os.walk(target_dir):
for file in files:
fn = os.path.join(base, file)
zipobj.write(fn, fn[rootlen:])
zipfolder(zip_path, extraction_path + "/")
# Upload to remarkable
to_upload = ZipDocument(file= ("/mnt/d/Users/Jake/Documents/PDF Encryption/Zipped Files/" + doc_id + ".zip"))
rm.upload(to_upload)
The output to this is:
The UUID is: ab011988-22d4-4eda-bf21-334bb65558c4
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 53.86it/s]
Traceback (most recent call last):
File "test2.py", line 103, in <module>
to_upload = ZipDocument(file= ("/mnt/d/Users/Jake/Documents/PDF Encryption/Zipped Files/" + doc_id + ".zip"))
File "/home/jake/.local/lib/python3.6/site-packages/rmapy/document.py", line 247, in __init__
self.load(file)
File "/home/jake/.local/lib/python3.6/site-packages/rmapy/document.py", line 329, in load
with zf.open(f"{self.ID}.content", 'r') as content:
File "/usr/lib/python3.6/zipfile.py", line 1375, in open
zinfo = self.getinfo(name)
File "/usr/lib/python3.6/zipfile.py", line 1304, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named '38f0fcc6-a403-4cbc-b949-c719375cc303.content' in the archive"
I've just realised - I think the change comes when I output.write
- it's technically a new PDF document I think, because of the way the encrypter has to work. I guess I'd then have to find the new UUID of the doc and change the names of the associated documents too. Any idea how I could find the new UUID?
OK, here's a breakdown of what is currently going on:
ZipDocument(doc="...)
and not ZipDocument(file=...
, which means that I lose all annotations when doing thisx = ZipDocument(file="...)
then looking at the output of x
should give the new ID; I then updated all the files in the directory to have this new ID before zipping. This didn't work, so I think perhaps the zipping process might change the ID as well?My current code is:
# Import dependencies
from curses import meta
from re import I
from rmapy.api import Client, ZipDocument
import zipfile
import os
from os.path import basename
import glob
from PyPDF2 import PdfFileWriter, PdfFileReader
from tqdm import tqdm
import shutil
# Complete startup checks
rm = Client()
rm.renew_token()
# Attempt to upload a file
#rawDocument = ZipDocument(doc='TestPDF.pdf')
#print(rawDocument.metadata)
#print(rm.get_meta_items())
# Obtain items from specific notebook (todo: work out how to pull multiple files from folder)
books = [ i for i in rm.get_meta_items() if i.VissibleName == "dummy" ][0]
metadata = books.to_dict()
#print(metadata)
doc_id = metadata['ID']
print("The UUID is:", doc_id)
name = books.VissibleName
# Download doc(s)
downloaded_file = rm.download(books)
# Delete old file now it's not needed any more
#rm.delete(books) # <- this is not undoable!!!!!
# Unzip to PC
# Set up folder paths
download_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/RM Zips/" + doc_id + ".zip"
extraction_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + doc_id
# Dump to PC and extract
downloaded_file.dump(download_path)
with zipfile.ZipFile(download_path, 'r') as zip_ref:
zip_ref.extractall(extraction_path)
# Find PDF
os.chdir(extraction_path)
#print(os.getcwd())
# Create function to encrypt PDFs
def encrypt_pdfs():
# Create array of PDFs in dir & number
pdfs = glob.glob("*.pdf")
number = len(glob.glob("*.pdf"))
# Iterate over all found pdfs
for i in tqdm(range(number)):
# Load file as pdf
file = pdfs[i]
file = PdfFileReader(file)
# Is the file already encrypted? If so, do nothing. Could optimise by checking before loop and removing entries that are?
if file.isEncrypted == True:
pass
# Because this plugin is weirdly made, I have to technically create a *new* PDF. Probably not very efficient!
else:
# Create new output PDF
output = PdfFileWriter()
# Determine no. of pages in file
num = file.numPages
# Iterate over all pages of the PDF and append to a new document
for index in range(num):
page = file.getPage(index)
output.addPage(page)
# Encrypt the output with a password
output.encrypt("Hi")
# Write document to file
with open(pdfs[i], "wb") as f:
output.write(f)
# Run new function
encrypt_pdfs()
# Rename document
os.rename("/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + doc_id + "/" + doc_id + ".pdf", "/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + doc_id + "/" + name + ".pdf")
# Convert doc to remarkable to find new UUID
to_upload = ZipDocument(doc= ("/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + doc_id + "/" + name + ".pdf"))
new_id = (str(to_upload))[-37:-1]
print("New ID is ", new_id)
shutil.copytree(("/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + doc_id), ("/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + new_id))
# Rename all other documents in folder
os.chdir("/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + new_id)
for i in os.listdir():
src = i
extension = src.find(".")
extension = src[extension:]
dst = new_id + extension
os.rename(src,dst)
# Create new zip output folder directory
zip_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/Zipped Files/" + new_id
target_path = extraction_path = "/mnt/d/Users/Jake/Documents/PDF Encryption/To Encrypt/" + new_id
def zipfolder(foldername, target_dir):
zipobj = zipfile.ZipFile(foldername + '.zip', 'w')
rootlen = len(target_dir)
for base, dirs, files in os.walk(target_dir):
for file in files:
fn = os.path.join(base, file)
zipobj.write(fn, fn[rootlen:])
zipfolder(zip_path, extraction_path + "/")
to_upload = ZipDocument(file= ("/mnt/d/Users/Jake/Documents/PDF Encryption/Zipped Files/" + new_id + ".zip"))
#rm.upload(to_upload)
Hi there!
I've been playing around using this and it's great, thank you for your work. I can successfully upload documents to my RM, but I can't figure out how to download ZipDocuments and extract them - I'd like to create a script that can automatically encrypt notebooks I make and return them to the RM Cloud. Here's the code I have working - no idea if this bit is even right or not!
Printing gives <rmapy.document.Document
document ID
>.Again, this yields <rmapy.document.ZipDocument
document ID
>. Looking at the code, I'm not sure what I should do next. Thank you!