interrogator / corpkit

A toolkit for corpus linguistics
Other
199 stars 27 forks source link

NameError: global name 'Corpus' is not defined #30

Open nanocombi opened 8 years ago

nanocombi commented 8 years ago

After I've installed the latest corpkit (2.1.1), I wanted to parse my corpus (which worked in the previous version - at least for approx. 40% of the texts) and received the message NameError: global name 'Corpus' is not defined. What did I do wrong? Below you will find the log - hoping that it provides a clue... Thx for your help!

log-00.txt

interrogator commented 8 years ago

Hey! You didn't do anything wrong, I've forgotten an import statement. Give me an hour or two and I'll fix it up for you.

Let me know if you have any other comments on the tool, too!

nanocombi commented 8 years ago

Wow. Thank you for your fast reply! From what I could see so far, corpkit definitely has the potential to replace the established tools that I previously worked with. I am looking forward to challenging this igenious application ;)

interrogator commented 8 years ago

I've just uploaded version 2.1.2. It seems like everything is working again. Let me know!

Thanks for the positive feedback, too! Be in touch if you have any suggestions.

nanocombi commented 8 years ago

Hey. Sorry to bother you again. The app now opens the parser options but prints an UnboundLocalError: local variable 'possible_paths' referenced before assignment

interrogator commented 8 years ago

Whoops. OK, hopefully fixed in 2.1.3.

P.S. Are you using the auto-update feature, or redownloading the app? I'm curious if the automatic update is working properly on others' machines.

nanocombi commented 8 years ago

Cheers! I am forced to redownload. A reason for that might be that I am working on a non-locally administered university workstation.

interrogator commented 8 years ago

Yeah, that makes it a bit more tricky. Oh well.

Let me know if you can now parse a corpus/interrogate it, so I can close the issue.

Also, I'm wondering about what you said before, about at least for approx. 40% of the texts being parsed. Can you elaborate?

interrogator commented 8 years ago

@nanocombi A few updates in the last days. Let me know how it goes!

nanocombi commented 8 years ago

Cheers! I am forced to redownload. A reason for that might be that I am working on a non-locally administered university workstation.

nanocombi commented 8 years ago

I am affraid, it still doesn't work. Attached you'll find the log. The error reads: OSError: [Errno 13] Permission denied: '/data' log-01.txt

interrogator commented 8 years ago

I haven’t looked at this in the code much, but it seems to me that there’s a chance that the problem is that you aren’t in a project when you add/parse the corpus.

Does it still appear if you make a new project and add a corpus that way?

I’ll take a look at the code a bit later, anyway, and see what I find.

On 18 May 2016, at 10:17 AM, nanocombi notifications@github.com<mailto:notifications@github.com> wrote:

I am affraid, it still doesn't work. Attached the log. log-01.txthttps://github.com/interrogator/corpkit/files/269929/log-01.txt

— You are receiving this because you commented. Reply to this email directly or view it on GitHubhttps://github.com/interrogator/corpkit/issues/30#issuecomment-219957439

nanocombi commented 8 years ago

I set up a new project and the error changed into: UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 3244: invalid start byte - Tried using a selection of txts from different corpora to rule out formating errors in my files. Thank you for your effords! log-00.txt

interrogator commented 8 years ago

Hey. So, that error basically means that corpkit is expecting UTF-8 encoded data, but isn't getting it. I'm happy to add code that will try to detect encoding, but it might take me a little while. In the meantime, you could also try to convert your files to UTF-8 encoding yourself.

interrogator commented 8 years ago

OK @nanocombi, I put up a fix. It tries to guess file encodings and convert to UTF-8. Getting your data into consistent UTF-8 beforehand is still probably a good idea, though.

Let me know if it works!