jjjake / internetarchive

A Python and Command-Line Interface to Archive.org
GNU Affero General Public License v3.0
1.63k stars 220 forks source link

Don't close file-like objects #654

Open ego-lay-atman-bay opened 2 months ago

ego-lay-atman-bay commented 2 months ago

Currently the internetarchive library closes any file-like objects that is passed into an input, e.g. using upload or download. This is really frustrating, because a great way to download a file to memory is by passing in a io.BytesIO object, however since internetarchive closes the file after downloading it, I can't get the contents after downloading it into a io.BytesIO object.

One way to solve this is by adding a parameter close_file, which default to True, and when it's set to False, it won't close the file, allowing for easier downloading a file to memory.

ego-lay-atman-bay commented 2 months ago

I don't want to create my own BytesIO class that ignores the close() method, just so I can still get the data after I download the file I want.

JustAnotherArchivist commented 2 months ago

I don't think a new parameter is necessary. Rather, File.download should only close fileobjs it creates itself and make it the caller's responsibility otherwise. So when the method receives a non-None value, it needs to be wrapped in a contextlib.nullcontext so the following context manager doesn't invoke __exit__ and close. (This is in fact one of the examples given in nullcontext's documentation.)

ego-lay-atman-bay commented 2 months ago

Actually, that would be better.

I think this should be done in both upload and download, because I have ran into an issue where I wanted to read the data of my file-like object after uploading it to archive.org (for logging purposes), but internetarchive was closing the object, so I had to save the data before uploading (the reason I wanted to do it after, was specifically so reading the data after could be based on if the upload was successful).

Another reason why I feel like it should not close, is because I feel like most python users would not expect it to be closed, especially since most libraries don't close file-like objects, and run into unexpected issues like I have.

JustAnotherArchivist commented 2 months ago

Yes, agreed. One thing to figure out is how Item.upload_file should behave when delete = True is specified. I think that option should simply have no effect if the user passed a file-like object rather than a path. Currently, it will try its best to delete the underlying file.

ego-lay-atman-bay commented 2 months ago

I feel like that behavior would make sense.