piskvorky / smart_open

Utils for streaming large files (S3, HDFS, gzip, bz2...)
MIT License
3.16k stars 382 forks source link

ResourceWarning: unclosed file #393

Open kenahoo opened 4 years ago

kenahoo commented 4 years ago

Problem description

I'm seeing a ResourceWarning: unclosed file warning when using context managers to open files/streams with smart_open.

Note: if I don't run under unittest, or if I don't gzip the file, resources seem to be closed correctly. I'm guessing unittest and smart_open are somehow not coordinating correctly on closing the layers.

Steps/code to reproduce the problem

My test script:

import unittest
from smart_open import open as smart_open

class MyTestCase(unittest.TestCase):
    def test_load(self):
        with smart_open('input.csv.gz') as fh:
            print("opened file")

unittest.main()

Invocation:

% echo -e 'col1,col2\nval1,val2\nval3,val4' | gzip > input.csv.gz 

% PYTHONTRACEMALLOC=1 python test_load.py
opened file
/Users/kwilliams/miniconda3/lib/python3.7/unittest/case.py:615: ResourceWarning: unclosed file <_io.BufferedReader name='input.csv.gz'>
  testMethod()
Object allocated at (most recent call last):
  File "/Users/kwilliams/git/dispatcher/rush-springs-simulations/venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py", lineno 548
    fobj = io.open(parsed_uri.uri_path, mode)
.
----------------------------------------------------------------------
Ran 1 test in 0.007s

OK

Versions

Darwin-18.0.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
smart_open 1.9.0
kenahoo commented 4 years ago

I should add - I'm not sure whether the warning is correct and the filehandle isn't being closed properly, or it's a spurious warning.

kenahoo commented 4 years ago

Hi, any thoughts on this?

piskvorky commented 4 years ago

@kenahoo thanks for the clear and detailed report. I agree context managers should be closing handles, so that looks like a bug.

@mpenkov is busy ATM – any chance you could take a stab at this yourself?

kenahoo commented 4 years ago

Hi @piskvorky - I'm afraid I probably won't be able to tackle this, mostly because I had a look at the guts of smart_open and I think I'm not up to the task at this point, but also because this is coming up in my "day job" and the deadlines are pretty tight, so I'm not able to commit the necessary time, at least in the short term.

danielpazeto commented 3 years ago

I had the same issue when trying to open a gz file, when I run unitests the same warning has been shown. I'm using smart-open==4.2.0 and Python 3.8.7

The warning message:

/mnt/c/Users/<user>/workspace/codes/tests/testXX.py:242: ResourceWarning: unclosed <ssl.SSLSocket fd=5,
 family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('yyy.yyy.yyy.yyy', XXXX),
 raddr=('zzz.zzz.zzz.zzz', 8080)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Pim-Claessens commented 2 years ago

Anyone been able to fix this?

mpenkov commented 2 years ago

Cannot reproduce on linux Python 3.10.6 and smart_open 6.1.0.

art12-3ds commented 1 year ago

I had the same issue when trying to open a file on the s3, whan I run unitests the same warning has been shown. My test script:

import smart_open
import unittest

class RunTest(unittest.TestCase):
    def test_load_pickle_s3(self):
        path = "s3://my_test_direcotry/test.pkl"
        with smart_open.open(path, "wb") as fh:
            print("open file")

    def test_load_pickle_local(self):
        path = "test.pkl"
        with smart_open.open(path, "wb") as fh:
            print("open file")

Package version: smart_open[s3] 6.3.0 Python 3.9.16

mpenkov commented 1 year ago

Are you able to work out the cause?

art12-3ds commented 1 year ago

Not at the moment