interrogator / corpkit

A toolkit for corpus linguistics
Other
199 stars 27 forks source link

Resizeable log window in Mac app #10

Closed NetBUG closed 8 years ago

NetBUG commented 8 years ago

Can you make log window in the app resizeable?

interrogator commented 8 years ago

How best to handle the log is a good question. At present, you can do Help --> Show log to bring up a text file, but it doesn't seem ideal. Is the ideal behaviour that you could drag the log up, so that we can see previous lines? Or did you have something else in mind?

A related point: once most bugs are fixed, the idea is that a one-line log is enough to know what's currently happening, so it's also a good idea to simply fix current bugs. What was the original issue? Could you possibly file another one for that?

NetBUG commented 8 years ago

Thank you, Daniel! Indeed, your log is a different one, it shows completely different things. I've cloned the repo to launch corpkit-gui.py and see console output. CoreNLP fails on some of my texts, thus I wanna see what happens:

❯ python corpkit/corpkit-gui.py
21:68: execution error: Finder got an error: Can’t set process "corpkit-1.67" to true. (-10006)
Adding annotator tokenize
TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
Adding annotator ssplit
Adding annotator pos
Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-   left3words-distsim.tagger ... done [0.7 sec].
Adding annotator lemma
Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.5 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].

Ready to process: 2 files, skipped 0, total 2
Processing file /Users/ourzhumt/projects/Demo/data/simple/LICENSE-simple.txt ... writing to /Users/ourzhumt/projects/Demo/data/simple-parsed/LICENSE-simple.txt.xml {
  Annotating file /Users/ourzhumt/projects/Demo/data/simple/LICENSE-simple.txt [52.53 seconds]

} [52.351 seconds] Processing file /Users/ourzhumt/projects/Demo/data/simple/smallcorpus-simple.txt ... writing to /Users/ourzhumt/projects/Demo/data/simple-parsed/smallcorpus-simple.txt.xml { Annotating file /Users/ourzhumt/projects/Demo/data/simple/smallcorpus-simple.txt [1.247 seconds] } [1.261 seconds]

Whereas "Show log" produced the file attached.

NetBUG commented 8 years ago

Seems I've broken Github issues tracker with the log. :( Let me reply here.

Thank you, Daniel! Probably for the end user single line status window is OK. However, there are many errors occurring not because of you (user can supplement wrong, badly encoded, broken data, CoreNLP or any other module might fail at random moment, all that stuff can run out of memory or disk space), and if you don't want to take responsibility for parsing error messages and tracebacks, log can reveal all these problems.

Indeed, you have two different logs, one falls to stdout+stderr, the other being saved to file. Stderr is more informative for user:

Annotating file

/Users/ourzhumt/projects/Demo/data/corpora/srData12.xml_all.txt {

Untokenizable: � (U+FFFD, decimal: 65533)

Exception in Tkinter callback

Traceback (most recent call last):

File

"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 1536, in call

return self.func(*args)

File

"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 587, in callit

func(*args)

File "corpkit/corpkit-gui.py", line 5643, in start_update_check

check_updates(showfalse = False, lateprint = True, auto = True)

File "corpkit/corpkit-gui.py", line 5567, in check_updates

oldd = open(os.path.join(rd, 'corpkit-gui.py'), 'r').read()

IOError: [Errno 2] No such file or directory: 'corpkit/corpkit-gui.py'

WARNING: Parsing of sentence failed, possibly because of out of memory.

Will ignore and continue: mmuruzab@cisco.com Subject : 600368255 CSCef31332 - C375

I've cloned the repo to launch corpkit-gui.py and see console output. CoreNLP fails on some of my texts, thus I wanna see what happens:

❯ python corpkit/corpkit-gui.py
21:68: execution error: Finder got an error: Can’t set process

"corpkit-1.67" to true. (-10006) Adding annotator tokenize TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer. Adding annotator ssplit Adding annotator pos Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english- left3words-distsim.tagger ... done [0.7 sec]. Adding annotator lemma Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.7 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [2.7 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.5 sec]. Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt Adding annotator parse Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.4 sec].

Ready to process: 2 files, skipped 0, total 2
Processing file

/Users/ourzhumt/projects/Demo/data/simple/LICENSE-simple.txt ... writing to /Users/ourzhumt/projects/Demo/data/simple-parsed/LICENSE-simple.txt.xml { Annotating file /Users/ourzhumt/projects/Demo/data/simple/LICENSE-simple.txt [52.53 seconds] } [52.351 seconds] Processing file /Users/ourzhumt/projects/Demo/data/simple/smallcorpus-simple.txt ... writing to /Users/ourzhumt/projects/Demo/data/simple-parsed/smallcorpus-simple.txt.xml { Annotating file /Users/ourzhumt/projects/Demo/data/simple/smallcorpus-simple.txt [1.247 seconds] } [1.261 seconds]

Whereas "Show log" produced the file attached.

2015-10-09 15:37 GMT-07:00 Daniel notifications@github.com:

How best to handle the log is a good question. At present, you can do Help --> Show log to bring up a text file, but it doesn't seem ideal. Is the ideal behaviour that you could drag the log up, so that we can see previous lines? Or did you have something else in mind?

A related point: once most bugs are fixed, the idea is that a one-line log is enough to know what's currently happening, so it's also a good idea to simply fix current bugs. What was the original issue? Could you possibly file another one for that?

— Reply to this email directly or view it on GitHub https://github.com/interrogator/corpkit/issues/10#issuecomment-147004036 .

[image: --] Oleg Urzhumtsev [image: https://]about.me/netbug https://about.me/netbug?promo=email_sig

16:09:17: Tool preferences loaded. Exception in Tkinter callback Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 1536, in call return self.func(_args) File "corpkit/corpkit-gui.py", line 1390, in corpus_callback subdrs = sorted([d for d in os.listdir(corpus_fullpath.get()) \ OSError: [Errno 2] No such file or directory: '' 16:10:22: 0 interrogations loaded from saved_interrogations. 16:10:22: 0 interrogations loaded from saved_concordances. 16:10:22: Project "Demo" opened. Exception in Tkinter callback Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 1536, in call return self.func(_args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 587, in callit func(*args) File "corpkit/corpkit-gui.py", line 5643, in start_update_check check_updates(showfalse = False, lateprint = True, auto = True) File "corpkit/corpkit-gui.py", line 5567, in check_updates oldd = open(os.path.join(rd, 'corpkit-gui.py'), 'r').read() IOError: [Errno 2] No such file or directory: 'corpkit/corpkit-gui.py' 16:11:14: Corpus copied to project folder. 16:11:14: Set corpus directory: "simple" 16:11:14: Corpus copied to project folder. 16:11:14: Selected corpus for viewing/parsing: "simple" 16:11:45: Log saved to "logs/log-01.txt". 16:12:25: Parsing finished. Moving parsed files into place ... 16:12:25: Set corpus directory: "simple-parsed" 16:12:25: Corpus parsed and ready to interrogate: "simple-parsed"

interrogator commented 8 years ago

Thanks for all this info.

Originally, I chose not to redirect the CoreNLP output through corpkit, because I didn't want to clog up the console with CoreNLP messages that the user wouldn't understand, especially when there are thousands of files being annotated. Maybe it'd be good to not print those messages in the console, but still to send them to the log? I wasn't expecting the parser to fail ... but it does seem very bad that the user won't know if it does. Maybe I should put all that stderr into the log?

I'd also be interested in finding out what caused the parser error---probably encoding or something. It shouldn't be too hard to fix the bits of code that move files into place for parsing to also make sure they're CoreNLP safe.

There's also an error in the log there regarding checking for updates, but I think that's a different thing, and perhaps caused by running the .py file, rather than the .app. Maybe I should just disable update checking if running from a .py script rather than .app, because anyone running from .py knows how to go about grabbing the latest version, etc.

Also, feel free to fork and work on the GUI code! I'd love some help!

NetBUG commented 8 years ago

I think you can make separate log buttons (tabs). I can also try to contribute to highlighting rules (e.g. I tried to do corpus search and got two separate outputs; however, the first traceback contained "No module named..." telling my Python distribution lacks a module, and the second told my regexp was a faulty one. Stderr could be definitely a useful log for an experienced customer. With proper typical error highlighting and reasonable FAQ, it can eliminate most errors.

The fault was indeed in UTF-8 file with non-English parts (the corpus is dirty, it's a live technical support records), big corpus and span between punctuation marks (there was a set of lines without periods/question/exclamation marks, Stanford Parser treated it as a single sentence). You might also want to advise the user about data size (text can grow nearly 100 times in size after being parsed) and time required.

I think the first line was not about updates, it was about setting proper icon, as I was launching the app via command line.

I will consider whether I can do a viable fork of your project. It's really nice, and I also really appreciate your enthusiasm to support it.

2015-10-09 16:35 GMT-07:00 Daniel notifications@github.com:

Thanks for all this info.

Originally, I chose not to redirect the CoreNLP output through corpkit, because I didn't want to clog up the console with CoreNLP messages that the user wouldn't understand, especially when there are thousands of files being annotated. Maybe it'd be good to not print those messages in the console, but still to send them to the log? I wasn't expecting the parser to fail ... but it does seem very bad that the user won't know if it does. Maybe I should put all that stderr into the log?

I'd also be interested in finding out what caused the parser error---probably encoding or something. It shouldn't be too hard to fix the bits of code that move files into place for parsing to also make sure they're CoreNLP safe.

There's also an error in the log there regarding checking for updates, but I think that's a different thing, and perhaps caused by running the .py file, rather than the .app. Maybe I should just disable update checking if running from a .py script rather than .app, because anyone running from .py knows how to go about grabbing the latest version, etc.

Also, feel free to fork and work on the GUI code! I'd love some help!

— Reply to this email directly or view it on GitHub https://github.com/interrogator/corpkit/issues/10#issuecomment-147011404 .

[image: --] Oleg Urzhumtsev [image: https://]about.me/netbug https://about.me/netbug?promo=email_sig