scrapinghub / python-crfsuite

A python binding for crfsuite
MIT License
770 stars 222 forks source link

Permission denied error while using pycrfsuite in rasa_nlu #61

Closed shuvayan closed 7 years ago

shuvayan commented 7 years ago

Hello,

I am trying to execute the below commands as given in http://rasa-nlu.readthedocs.io/en/latest/python.html but getting an error as permission denied as shown below:

`>>> trainer.train(training_data) Traceback (most recent call last): File "", line 1, in File "C:\Users\shuvayan.das\AppData\Local\Continuum\Anaconda3.3\lib\site-packa ges\rasa_nlu\model.py", line 157, in train updates = component.train(*args) File "C:\Users\shuvayan.das\AppData\Local\Continuum\Anaconda3.3\lib\site-packa ges\rasa_nlu\extractors\crf_entity_extractor.py", line 80, in train self._train_model(dataset) File "C:\Users\shuvayan.das\AppData\Local\Continuum\Anaconda3.3\lib\site-packa ges\rasa_nlu\extractors\crf_entity_extractor.py", line 308, in _train_model self.ent_tagger.open(self.crf_file.name) File "pycrfsuite/_pycrfsuite.pyx", line 571, in pycrfsuite._pycrfsuite.Tagger. open (pycrfsuite/_pycrfsuite.cpp:7731) File "pycrfsuite/_pycrfsuite.pyx", line 717, in pycrfsuite._pycrfsuite.Tagger. _check_model (pycrfsuite/_pycrfsuite.cpp:10037) PermissionError: [Errno 13] Permission denied: 'C:\Users\shuvayan.das\AppData \Local\Temp\tmpy84meugg'

`

Please help in resolving this. I am using windows here.

shuvayan commented 7 years ago

Solved. I was using the wrong config.json file in rasa_nlu

Vicharian commented 7 years ago

shuvayan,

I have the same problem. I have checked my config.json again [using http://rasa-nlu.readthedocs.io/en/latest/migrations.html & http://rasa-nlu.readthedocs.io/en/latest/config.html]. Would appreciate any pointers. this is my config file

{ "pipeline": "spacy_sklearn", "path" : "Rasa/models", "data" : "Rasa/train/examples/demo-rasa.json", "emulate" : "luis" }

shuvayan commented 7 years ago

This issue is because of using ner_crf in my case as I have defined in config_spacy.json as shown below:

{ "backend": "spacy_sklearn", "path" : "./models", "data" : "./data/trainData.json", "pipeline": ["nlp_spacy", "ner_crf", "ner_synonyms"] }

there is no error if I use ner_spacy. Is the permission issue due to not having admin rights ?? Please help in resolving this!!

I have created some training data like below:

{ "text": "I want a shoe of black color and size 9", "intent": "buy", "entities": [ { "start": 9, "end": 14, "value": "shoe ", "entity": "product" }, { "start": 17, "end": 23, "value": "black ", "entity": "color" }, { "start": 38, "end": 39, "value": "9", "entity": "size" } ] }, And since I need to custom train my model it is imperative that I use ner_crf. So can someone please help in resolving this issue.

kmike commented 7 years ago

Hey,

I believe it is a bug in rasa-nlu library - it uses NamedTemporaryFIle, and this may cause problems on Windows - see http://stackoverflow.com/questions/18903069/how-can-i-read-namedtemporaryfile-in-python. This can't be fixed in python-crfsuite; python-crfsuite just tries to save a file to a specified path, or open a file at a specified path; it is up to the caller to make sure destination is writable or readable.

I suggest to open an issue at rasa-nlu bug tracker. Instead of managing these temporary files manually it may be easier to use https://github.com/TeamHG-Memex/sklearn-crfsuite which handles temporary files automatically; with sklearn-crfsuite pickle or joblib can be used for persistence.

tmbo commented 7 years ago

@kmike thanks for investing time into finding the cause of this. I am a developer of rasa NLU. The problem is that we train the model and want to use it right away. At that point we don't know yet where to store the model.

Is there any possibility to retrieve the model after training without storing it in between? E.g. instead of

trainer.train(self.crf_file.name)
self.ent_tagger.open(self.crf_file.name)

Something along the lines of:

ent_tagger = trainer.train()
kmike commented 7 years ago

@tmbo there are two answers for your question :)

CRFSuite C++ library added in-memory models support recently, but we can't use it because it breaks a build on Windows for all Pythons < 3.5 without https://github.com/chokkan/crfsuite/pull/66. Versions which work with Python 2 on WIndows (including a commit which is bundled with python-crfsuite) require to save the model on disk before using it. So truly in-memory models can be added to python-crfsuite, but it requires some work (switch to a CRFsuite fork with necessary fixes, expose in-memory support in a Cython wrapper).

But sklearn-crfsuite (which is a python-crfsuite wrapper) handles this transparently; instead of using separate Trainer and Tagger objects and managing temporary files you can use a single CRF object which has fit and predict methods, similar to scikit-learn estimators.

tmbo commented 7 years ago

Ok I think the second one seems to be easier to achieve. Is there any downside in switching?

kmike commented 7 years ago

No downsides. Well, maybe only that sklearn-crfsuite brings you a couple of new pure-python dependencies (python-tabulate, tqdm) which are installed automatically along with sklearn-crfsuite. scikit-learn itself is not required for the basic use.

Vicharian commented 7 years ago

Just a general comment not related to the issue. It is mind-blowing how quickly we have reached a logical end to bug investigation. Of course the fix might be some time away, but when technology is able to enable collaborative problem solving like this it takes your breath away. Of course, you need people like kmike and tmbo -- it was great to see you gents at work.