Closed sanzoghenzo closed 3 years ago
Hey Andrea, thanks!
I'm not sure I fully understand your request. Can you elaborate on this please:
but I would like to use a file object (the one downloaded from the other service) without writing it to memory first.
What kind of file object? Also, why do you not want to write to memory first?
In all cases you must use memory to upload anything. What the Multipart writer does is that it uploads any given file in chunks instead of loading an entire file (which can be in GBs) to memory and uploading it in one go.
Hi Omar, thanks for the quick response!
I used the wrong terms, forgive my poor English! I wanted to say exactly the opposite π
What I meant is that I don't want to write the file to disk, but use the content of a variable (the result of the download from the other service), to skip some extra step (write to a temporary file, give it to files.create()
and delete the file).
Don't worry about it!
Gotcha. I can see how that can be useful. I'll be happy to accept a PR for that.
Please correct me if I'm wrong, essentially what we'll be doing is instead of opening a file and passing the read bytes in the request's body, we want to pass bytes to the request directly.
I'm thinking a good place to pass it is when creating the request. So instead of passing upload_file
we pass upload_body
and this parameter should accept bytes.
I'm open to different suggestions.
Please correct me if I'm wrong, essentially what we'll be doing is instead of opening a file and passing the read bytes in the request's body, we want to pass bytes to the request directly.
Exactly that!
I'm thinking a good place to pass it is when creating the request. So instead of passing
upload_file
we passupload_body
and this parameter should accept bytes.I'm open to different suggestions.
So the plan should be:
upload_body
parameter to Method.__call__()
upload_file
and upload_body
are not both present at the same timemedia_upload
-like object that handles bytes instead of file path?request.media_upload
inside the fire_request
coroutine based on the type of the instance/presence of file_path
Maybe the last point is better accomplished by moving the validation and reading of the file to MediaUpload
/ResumableUpload
(and the new class, i.e. MediaBytesUpload
) methods, so that there will be more separation of concerns.
Do you have anything against the use of dataclasses
for the declaration of the models
?
I'm (over)using them (along with their cousin pydantic BaseModel
) to get automatic dunder methods (and validation, in the case of BaseModel
).
Let me know what you think
We can actually use upload_file
for both bytes and file names. I am always hesitant to add new kwargs to __call__()
because it might conflict with the URL parameters being passed in **kwargs
. If I can go back in time, I would replace **kwargs
with a simple url_parameters
kwarg.
About creating a media_upload
object, why not use the same one? and add a file_body
property instead. And in Resource.__call__
we check if the passed upload_file
is of type bytes, then we pass it to file_body
and if it's a string, we pass it to file_path
.
ResumableUpload
is actually being passed to MediaUpload
. Think of MediaUpload
as the parent upload object and everything else should be passed to it to avoid duplication.
About dataclasses
to me it was just easier to use what I already know works than using the (back then, new) dataclass feature which I knew little about. Assuming you're confused about why some of the models inheriting from dict
, I did that because initially, I only wanted to keep some of the models as dicts to enforce simplicity. But then I wasn't able to auto-document them using sphinx (our docs lib). So I made this funky class-dict hybrid.
Well, I've fully embarked the typing/type hinting boat, so I try to avoid giving a parameter multiple types, especially if different types means different behaviors. For that I prefer to have separate, smaller classes, so it's clear what each of the objects does, and use composition to build my app.
This is what I thought to do:
class MediaUpload:
def __init__(
self,
file_path,
upload_path=None,
mime_range=None,
max_size=None,
multipart=False,
chunk_size=None,
resumable=None,
validate=True,
):
self.file_path = file_path
self.upload_path = upload_path
self.mime_range = mime_range
self.max_size = max_size
self.multipart = multipart
self.chunk_size = chunk_size or DEFAULT_UPLOAD_CHUNK_SIZE
self.resumable = resumable
self.validate = validate
async def run_validation(self):
if self.validate and self.max_size:
size = await self.get_size()
if size > self.max_size:
raise ValidationError(
f'"{self.file_path}" has a size of {size / 1000}KB. '
f'Max upload size for this endpoint is: '
f'{self.max_size / 1000}KB.'
)
async def aiter_file(self):
return _aiter_file(self.file_path, self.chunk_size)
async def read_file(self):
async with aiofiles.open(self.file_path, "rb") as file:
return await file.read()
async def get_size(self):
stat = await async_os.stat(self.file_path)
return stat.st_size
@dataclass
class MediaUploadBytes:
body: bytes
upload_path: str = None
mime_range: Any = None
max_size: int = None
multipart: bool = False
chunk_size: int = DEFAULT_UPLOAD_CHUNK_SIZE
resumable: Optional[ResumableUpload] = None
validate: bool = True
async def run_validation(self):
if self.validate and self.max_size:
size = await self.get_size()
if size > self.max_size:
raise ValidationError(
f'"{self}" has a size of {size / 1000}KB. '
f'Max upload size for this endpoint is: '
f'{self.max_size / 1000}KB.'
)
async def aiter_file(self):
for x in range(0, len(self.body), self.chunk_size):
yield self.body[x:x + self.chunk_size]
async def read_file(self):
return self.body
async def get_size(self):
return len(self.body)
This way, we delegate the handling of the upload media to the class itself, and each class has its logic instead of using multiple if
s.
Well, I still have to add a conditional inside the _build_upload_media
to initialize the right class, but after that there's no more type/parameter checking involved.
As for the ResumableUpload
object, where is it used? Could I remove the file_path
and make it compatible with both classes?
First of all, thank you so much!
I have a couple of issues with this.
We can't use aiofiles
and async_os
in here because they are asyncio only. i.e. they don't work on Trio nor Curio. I kind of regret making the guarantee to support these libs because I don't think people actually use them. But it's a decision we have to live with. Especially that the cost of supporting them at this point isn't that big.
I get your point about preferring composition over inheritance. But having a separate class with only one different property (body
vs file_path
) doesn't sit well with me, especially that the difference between them is minor. If we'll be making a separate class, can you please use inheritance for that?
For the sake of consistency, can we not use dataclasses? I don't mind converting all the models to dataclasses if we can make sure they'll be backward compatible and generally won't incur any unneeded costs, performance or otherwise.
About the resumable object. Yes, it's not being used. I decided to keep it to encourage future contributors to implement the network part of the resumable feature.
To answer your question, yes, you can remove the file_path
attribute.
Oh, I didn't think about the other async frameworks! And I appreciate that maybe two classes are too much for this.
Let's move the composition in the MediaUpload methods then, so each framework can pass the right async function
class MediaUpload:
def __init__(
self,
file_path_or_bytes,
upload_path=None,
mime_range=None,
max_size=None,
multipart=False,
chunk_size=None,
resumable=None,
validate=True,
):
if isinstance(file_path_or_bytes, bytes):
self.file_body = file_path_or_bytes
self.file_path = None
else:
self.file_body = None
self.file_path = file_path_or_bytes
self.upload_path = upload_path
self.mime_range = mime_range
self.max_size = max_size
self.multipart = multipart
self.chunk_size = chunk_size or DEFAULT_UPLOAD_CHUNK_SIZE
self.resumable = resumable
self.validate = validate
async def run_validation(self, size_func):
if self.validate and self.max_size:
size = await size_func(self.file_path) if self.file_path else len(self.file_body)
if size > self.max_size:
raise ValidationError(
f'"{self}" has a size of {size / 1000}KB. '
f'Max upload size for this endpoint is: '
f'{self.max_size / 1000}KB.'
)
async def aiter_file(self, aiter_func):
if self.file_path:
return aiter_func(self.file_path, self.chunk_size)
return self._aiter_body()
async def _aiter_body(self):
for x in range(0, len(self.file_body), self.chunk_size):
yield self.file_body[x:x + self.chunk_size]
async def read_file(self, read_func):
return self.file_body or await read_func(self.file_path)
def __str__(self):
return self.file_path or "File object"
This won't break the class initialization (well, in _build_upload_media
you passed the path as keyword argument, but it was a positional one and it's safe to remove the name of the argument) and should work on each case.
I suspect I'll be the contributor for the resumeable feature, since I have to backup large files π
Looks pretty good to me π
I suspect I'll be the contributor for the resumeable feature, since I have to backup large filesΒ π
Definitely looking forward to that! :)
Thanks a lot for this great library!
I would like to use it in a tool that performs automated backups of autodesk bim360 files (but it could be any other storage service) to a google drive folder.
As of now, the only way to use the drive method
files.create()
is to specify theupload_file
parameter as the file path, but I would like to use a file object (the one downloaded from the other service) without writing it to memory first.I see that the
fire_request
coroutine inAiohttpSession.send
does the reading of the file viaaiofiles.open
, and then passes the file object (orMultipartWriter
object) toself.request
.Do you think it could be a good addition to support my use case?
If so, I could try to make a PR for that, but I would like to know your thoughts on this (and tips on how to accomplish this in a clean way).