grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python
Apache License 2.0
3.98k stars 787 forks source link

WindowsError: [Error 32] The process cannot access the file because it is being used by another process #78

Closed idf closed 10 years ago

idf commented 10 years ago

I am using Goose on Windows Platform.

Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information.

from goose import Goose Goose() Traceback (most recent call last): File "", line 1, in File "d:\Program Files (x86)\python273\lib\site-packages\goose_extractor-1.0.8 -py2.7.egg\gooseinit.py", line 38, in init self.initialize() File "d:\Program Files (x86)\python273\lib\site-packages\goose_extractor-1.0.8 -py2.7.egg\gooseinit.py", line 82, in initialize os.remove(path) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\users\danyang\appdata\local\temp\goose\tmpj2 avys'

grangier commented 10 years ago

Sorry but, I don't have any windows testing environment. Are you sure to launch a single goose frocess ?

idf commented 10 years ago

One cmd, thus it is highly like that it is one single goose process. I temporarily solve this issue by commenting out os.remove(path) in goose/init.py

        try:
            f = open(path, 'w')
            f.close()
            # os.remove(path)
        except IOError:
            raise Exception(self.config.local_storage_path +
                " directory is not writeble, "
                "you need to set this for image processing downloads"
            )

I guess the temporary file is being accessed by current process. And the temporary file mechanism in Linux maybe is different from Windows.

Any better solution?

joedf commented 10 years ago

I also get this error...

scrimshander commented 10 years ago

The answer is to not open the file separately with f.open, because it is already opened by mkstemp. It can be opened for writing with os.fdopen, so replacing the f = open command with f = os.fdopen(level, 'w') fixes it. I'm a rookie at GitHub and how to contribute, so if someone wants to teach me what I need to do, I'd be happy to contribute to the actual code!

joedf commented 10 years ago

Navigate to the file on the project's github page and click Edit. It will automactically fork the project for you. Then, once you have finished your edit, go to the bottom and write a summary/reason of the change and click "Propose file change".

jueya1213 commented 10 years ago

what "scrimshander" just said can address it. you can change source codes of goose/init.py like this: ... level, path = mkstemp(dir=self.config.local_storage_path) try: f = os.fdopen(level, 'w') f.close() os.remove(path) ... the WindowsError [Error 32] wil disappear .

grangier commented 10 years ago

Hello,

please check out this branch and let me know if the issue is solved https://github.com/grangier/python-goose/tree/feature/win-file-97 I don't have a windows plateform to test.

Regards,

Xav

idf commented 10 years ago

78 is correctly solved

Danyang@DANIEL /d/Programming/python/python-goose (feature/win-file-97)
$ python
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.
>>> from goose import Goose
>>> Goose()
<goose.Goose object at 0x022FD990>
>>>

However, not all tests can pass

Danyang@DANIEL /d/Programming/python/python-goose (feature/win-file-97)
$ python -m unittest discover --pattern=*.py
E.........Building Trie..., from d:\Program Files (x86)\python273\lib\site-package
s\jieba-0.32-py2.7.egg\jieba\dict.txt
loading model from cache c:\users\danyang\appdata\local\temp\jieba.cache
loading model cost 1.07999992371 seconds.
Trie has been built succesfully.
..................................................................................
..............................FFF
======================================================================
ERROR: setup (unittest.loader.ModuleImportFailure)
----------------------------------------------------------------------
ImportError: Failed to import test module: setup
Traceback (most recent call last):
  File "d:\Program Files (x86)\python273\lib\unittest\loader.py", line 254, in _fi
nd_tests
    module = self._get_module_from_name(name)
  File "d:\Program Files (x86)\python273\lib\unittest\loader.py", line 232, in _ge
t_module_from_name
    __import__(name)
  File "d:\Programming\python\python-goose\setup.py", line 70, in <module>
    test_suite="tests"
  File "d:\Program Files (x86)\python273\lib\distutils\core.py", line 140, in setu
p
    raise SystemExit, gen_usage(dist.script_name) + "\nerror: %s" % msg
SystemExit: usage: python -m unittest [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_o
pts] ...]
   or: python -m unittest --help [cmd1 cmd2 ...]
   or: python -m unittest --help-commands
   or: python -m unittest cmd --help

error: invalid command 'discover'

======================================================================
FAIL: test_embed (tests.videos.ImageExtractionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\Programming\python\python-goose\tests\videos.py", line 75, in test_embe
d
    self.runArticleAssertions(article=article, fields=fields)
  File "d:\Programming\python\python-goose\tests\extractors.py", line 124, in runA
rticleAssertions
    getattr(self, assertion)(field, expected_value, result_value)
  File "d:\Programming\python\python-goose\tests\videos.py", line 60, in assert_mo
vies
    self.assertEqual(r, v)
AssertionError: '<embed src="https://www.youtube.com/v/M7lc1UVf-VE?version=3&amp;a
utoplay=1" type="application/x-shockwave-flash" allowscriptaccess="always" width="
640" height="390"/>&#13;' != u'<embed src="https://www.youtube.com/v/M7lc1UVf-VE?v
ersion=3&amp;autoplay=1" type="application/x-shockwave-flash" allowscriptaccess="a
lways" width="640" height="390"/>'

======================================================================
FAIL: test_iframe (tests.videos.ImageExtractionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\Programming\python\python-goose\tests\videos.py", line 80, in test_ifra
me
    self.runArticleAssertions(article=article, fields=fields)
  File "d:\Programming\python\python-goose\tests\extractors.py", line 124, in runA
rticleAssertions
    getattr(self, assertion)(field, expected_value, result_value)
  File "d:\Programming\python\python-goose\tests\videos.py", line 60, in assert_mo
vies
    self.assertEqual(r, v)
AssertionError: '<iframe frameborder="0" width="480" height="270" src="http://www.
dailymotion.com/embed/video/x130bpf"/>&#13;' != u'<iframe frameborder="0" width="4
80" height="270" src="http://www.dailymotion.com/embed/video/x130bpf"/>'

======================================================================
FAIL: test_object (tests.videos.ImageExtractionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "d:\Programming\python\python-goose\tests\videos.py", line 85, in test_obje
ct
    self.runArticleAssertions(article=article, fields=fields)
  File "d:\Programming\python\python-goose\tests\extractors.py", line 124, in runA
rticleAssertions
    getattr(self, assertion)(field, expected_value, result_value)
  File "d:\Programming\python\python-goose\tests\videos.py", line 60, in assert_mo
vies
    self.assertEqual(r, v)
AssertionError: '<object width="640" height="390">&#13;<param name="movie" value="
https://www.youtube.com/v/M7lc1UVf-VE?version=3&amp;autoplay=1"/>&#13;<param name=
"allowScriptAccess" value="always"/>&#13;<embed src="https://www.youtube.com/v/M7l
c1UVf-VE?version=3&amp;autoplay=1" type="application/x-shockwave-flash" allowscrip
taccess="always" width="640" height="390"/>&#13;</object>&#13;' != u'<object width
="640" height="390"><param name="movie" value="https://www.youtube.com/v/M7lc1UVf-
VE?version=3&amp;autoplay=1"/><param name="allowScriptAccess" value="always"/><emb
ed src="https://www.youtube.com/v/M7lc1UVf-VE?version=3&amp;autoplay=1" type="appl
ication/x-shockwave-flash" allowscriptaccess="always" width="640" height="390"/></
object>'

----------------------------------------------------------------------
Ran 125 tests in 26.126s

FAILED (failures=3, errors=1)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

It is also happening without the #98. I guess #98 can correctly solve #78, but there is a long way ahead to integrate goose with Windows.

grangier commented 10 years ago

I'm not sure why the tests doesn't pass under windows

idf commented 10 years ago

After looking into the message. The 3 failures are due to &#13; at the end, but the assertions are essentially correct.
Regarding the error unittest.loader.ModuleImportFailure, I guess is the test loader issues. How to you run your test cases? I am using python -m unittest discover --pattern=*.py.

grangier commented 10 years ago

@zhangdanyangg python setup.py test

idf commented 10 years ago

Thanks. No error now. Only the 3 failures due to issue of &#13;.

grangier commented 10 years ago

@zhangdanyangg could you create an other issue ?

idf commented 10 years ago

Sure. At #102.

joedf commented 10 years ago

This is issue has been fixed?

grangier commented 10 years ago

@joedf yes

jerwin2018 commented 5 years ago

ERROR: Could not install packages due to an EnvironmentError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\User\AppData\Local\Temp\pip-req-tracker-d0sxobbf\57c162bad943c226c1d2a75e59537b7845202d095bf28c0bcc570868' Consider using the --user option or check the permissions.