Closed ranjeetkumar closed 7 years ago
I believe we should not be ignoring "-DOCSTART- -X- -X- O" On Sun, 12 Mar 2017 at 10:53 PM, Ranjeet kumar notifications@github.com wrote:
Hi professor/TA, The result of all a3_test.py tests passed. When I run a3.py, I am getting different result from log.txt. Even the shape of training and testing were different My output: training data shape: (27858, 18724) testing data shape: (28028, 18724) Expected output: training data shape: (27858, 18287) testing data shape: (28028, 18287)
All the rest output also have rippled effect. I am suspecting something wrong in feature dictionary. Suppose we have the following data: [[('EU', 'NNP', 'I-NP', 'I-ORG'), ('rejects', 'VBZ', 'I-VP', 'O'), ('German', 'JJ', 'I-NP', 'I-MISC')] My code generate the following feature dictionary: [{'chunk=I-NP': 1, 'pos=NNP': 1, 'next_chunk=I-VP': 1, 'tok=eu': 1, 'next_tok=rejects': 1, 'is_caps': 1, 'next_pos=VBZ': 1}, {'pos=VBZ': 1, 'next_chunk=I-NP': 1, 'prev_tok=eu': 1, 'next_pos=JJ': 1, 'tok=rejects': 1, 'chunk=I-VP': 1, 'prev_chunk=I-NP': 1, 'next_tok=german': 1, 'prev_pos=NNP': 1}, {'chunk=I-NP': 1, 'next_tok=call': 1, 'prev_chunk=I-VP': 1, 'is_caps': 1, 'prev_tok=rejects': 1, 'next_chunk=I-NP': 1, 'tok=german': 1, 'next_pos=NN': 1, 'prev_pos=VBZ': 1, 'pos=JJ': 1}]
Is this correct? I have double check my code still, I am not able to figure out where I am going wrong. Any hints why I am getting different shape of data? I am ignoring "-DOCSTART- -X- -X- O" and newline from the train and test data.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iit-cs585/assignments/issues/20, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_31tbeyzhMEYNu1qLdRG3jVUn_vWx0ks5rlCn-gaJpZM4Mamp5 .
-- Regards, Gagan
Thanks Gagan, By including "-DOCSTART- -X- -X- O", the shape of training and testing are not matching again. Now, even rows value are different from the expected value. training data shape: (27989, 18733) testing data shape: (28138, 18733)
By doctest from a3.py, I am assuming we have to ignore that.
train_data = read_data('train.txt') train_data[:2] [[('EU', 'NNP', 'I-NP', 'I-ORG'), ('rejects', 'VBZ', 'I-VP', 'O'), ('German', 'JJ', 'I-NP', 'I-MISC'), ('call', 'NN', 'I-NP', 'O'), ('to', 'TO', 'I-VP', 'O'), ('boycott', 'VB', 'I-VP', 'O'), ('British', 'JJ', 'I-NP', 'I-MISC'), ('lamb', 'NN', 'I-NP', 'O'), ('.', '.', 'O', 'O')], [('Peter', 'NNP', 'I-NP', 'I-PER'), ('Blackburn', 'NNP', 'I-NP', 'I-PER')]]
I am suspecting something is wrong in feature dictionary. Those outputs of feature dictionary I got by turning all the parameters to true
How do you get the count 27858 ? By removing empty lines and 'DOCSTART...' lines, I count 27867 entries.
For the second dimension, I believe you should check the make_features_dicts function. Maybe double check the 'context' flag.
I have ignore "DOCSTART..." and empty lines. I think 27858 is correct. In log.txt also it is 27858 https://github.com/iit-cs585/assignments/blob/master/a3/Log.txt
I am checking the context flag, still not able to find anything. This is the slice of first three from feature dictionary from train.txt.
[{'chunk=I-NP': 1, 'pos=NNP': 1, 'next_chunk=I-VP': 1, 'tok=eu': 1, 'next_tok=rejects': 1, 'is_caps': 1, 'next_pos=VBZ': 1}, {'pos=VBZ': 1, 'next_chunk=I-NP': 1, 'prev_tok=eu': 1, 'next_pos=JJ': 1, 'tok=rejects': 1, 'chunk=I-VP': 1, 'prev_chunk=I-NP': 1, 'next_tok=german': 1, 'prev_pos=NNP': 1}, {'chunk=I-NP': 1, 'next_tok=call': 1, 'prev_chunk=I-VP': 1, 'is_caps': 1, 'prev_tok=rejects': 1, 'next_chunk=I-NP': 1, 'tok=german': 1, 'next_pos=NN': 1, 'prev_pos=VBZ': 1, 'pos=JJ': 1}]
Do you find anything wrong in this feature dictionary?
@ranjeetkumar prev_is_caps=1 is missing from the second dict.
Also, I was missing the final sentence in my Log.txt. I've pushed an update.
Yes everything's matching now.
Thanks @aronwc @benoit0192 I too missed the final sentence and put extra "=" in is_cap that why problem was coming, Still, I am not able to get the exact answer from log.txt. My doubt in the calculation of the average F1 score.
Given the evaluation matrix, how we will calculate average F1 score ? From log, evaluation matrix: I-LOC I-MISC I-ORG I-PER O precision 0.745115 0.865672 0.673111 0.705882 0.972336 recall 0.729565 0.407733 0.377340 0.856041 0.990355 f1 0.737258 0.554361 0.483586 0.773744 0.981263
average f1s: 0.591735
Average(0.737258, 0.554361 , 0.483586, 0.773744) != 0.591
Is there, other method used for calculating average f1?
My fault - fixed.
On Sun, Mar 12, 2017 at 8:08 PM, Ranjeet kumar notifications@github.com wrote:
Thanks @aronwc https://github.com/aronwc @benoit0192 https://github.com/benoit0192 I too missed the final sentence and put extra "=" in is_cap that why problem was coming, Still, I am not able to get the exact answer from log.txt. My doubt in the calculation of the average F1 score.
Given the evaluation matrix, how we will calculate average F1 score ? From log, evaluation matrix: I-LOC I-MISC I-ORG I-PER O precision 0.745115 0.865672 0.673111 0.705882 0.972336 recall 0.729565 0.407733 0.377340 0.856041 0.990355 f1 0.737258 0.554361 0.483586 0.773744 0.981263
average f1s: 0.591735
Average(0.737258, 0.554361 , 0.483586, 0.773744) != 0.591
In there, other method used for calculating average f1?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/iit-cs585/assignments/issues/20#issuecomment-285993724, or mute the thread https://github.com/notifications/unsubscribe-auth/ADv-hYgOSx4lhLWbO_0Vs--x0G_8USEmks5rlJcSgaJpZM4Mamp5 .
@ranjeetkumar Have you been able to match the confusion matrix with the log file? I have one misclassified instance compared to the log...
_ | I-LOC | I-MISC | I-ORG | I-PER | O |
---|---|---|---|---|---|
I-LOC | 839 | 10 | 76 | 119 | 106 |
I-MISC | 46 | 232 | 32 | 43 | 216 |
I-ORG | 136 | 17 | 383 | 259 | 220 |
I-PER | 57 | 3 | 37 | 1332 | 127 |
O | 48 | 6 | 40:exclamation: | 134 | 23515 :exclamation: |
@aronwc According to the confusion matrix values, it would appear that isupper()
function has been applied on the whole string instead of the first letter. As a result, 'The'
would not be considered as is_caps
. Is it okay?
# Here is what I mean
str = 'Hello'
str.isupper() # output False
str[0].isupper() # output True
Because of that, the F1 values are lower than they could be.
@benoit0192 till now I am not able to get correct result. I am looking case by case. Out of sixteen configurations, I got right in two cases i.e. pos=True or chunk=True along with token=True and rest as False. In all true case mine confusion matrix is different. I am debugging into it.
One more observation is that when I check the whole token as the upper case then it is matching with the log file result for f1. f1 nparams iscap pos chuck context 0.367003 30920 True False False False When I check only first character of the token is upper or not then I got the following result. f1 nparams iscap pos chuck context 0.637879 30920 True False False False Question, I guess asking about the first character of the token being upper or not.
@aronwc @benoit0192 When I run with the simplest configuration with context enabled i.e. token=True, caps=False, pos=False, chunk=False, context=True
The slice of first 5 dicts of training data are {'tok=eu': 1, 'next_tok=rejects': 1} {'next_tok=german': 1, 'tok=rejects': 1, 'prev_tok=eu': 1} {'next_tok=call': 1, 'tok=german': 1, 'prev_tok=rejects': 1} {'prev_tok=german': 1, 'tok=call': 1, 'next_tok=to': 1} {'tok=to': 1, 'next_tok=boycott': 1, 'prev_tok=call': 1} The slice of last 5 dicts of training data are {'prev_tok=belgian': 1, 'next_tok=prix': 1, 'tok=grand': 1} {'tok=prix': 1, 'prev_tok=grand': 1, 'next_tok=practice': 1} {'next_tok=times': 1, 'prev_tok=prix': 1, 'tok=practice': 1} {'next_tok=.': 1, 'tok=times': 1, 'prev_tok=practice': 1} {'prev_tok=times': 1, 'tok=.': 1}
This look okay to me. If this is correct then the I guess results in log where context is enabled are wrong. The f1s value of this, I got 0.467399 and nparams is 92745 where as in log it is 0.467874 and 90550 respectively.
@ranjeetkumar Right - I fixed the is_caps
feature and reran.
I get the same dicts as in your example above.
@aronwc dicts are going to same Dictvectorize, the result should come out same.
From another angle, As my result is matching except the cases where context is True Taking these from log.txt f1 n_params caps pos chunk context 0 0.330491 30915 False False False False 1 0.467874 90550 False False False True
Assuming following as true, f1 n_params caps pos chunk context 0 0.330491 30915 False False False False
Total number of unique feature of this = 30915/5 = 6183
Now, context and token both are true, rest other false. In this case, we are adding previous and next token Minimum unique features(In worst case scenario assuming all are unique tokens) The total number of unique features = 6183 x 3 - 2 = 18547 The maximum number of unique features = 6183x3 = 18549 [some overlap among features] Here 3 comes from (tok= , next_tok, prev_tok) and 2 by first and last dictionaries have length of 2 so it either contain prev_tok or next_tok along with tok
The range of nparam for context and token both are true, rest other false are min = 18547x5 = 92735 [5 is no of class] max = 18549x5 = 92745
The value 90550 does not fall into this range. This is my intuition and my result is matching with the range bound. Please correct me if I am going wrong.
@ranjeetkumar To confirm: the context features should not cross sentence boundaries.
E.g., if sentence one is "A brown dog" and sentence two is "The black cat", the context features for the token "The" should not include "prev_tok=dog"
@aronwc Thanks a lot! professor. I have included context features beyond cross sentence boundaries. I have removed that. Now my result is perfectly matching with the log.txt
Hey Guys, I rechecked again all parameters and found to have same features shape as mentioned before. what is missing to get 27867 ?
total labels: 27858 training data shape: (27858, 18726) [ {'is_caps': 1, 'chunk=I-NP': 1, 'tok=eu': 1, 'next_pos=VBZ': 1, 'next_chunk=I-VP': 1, 'next_tok=rejects': 1, 'pos=NNP': 1}, {'pos=VBZ': 1, 'prev_tok=eu': 1, 'next_tok=german': 1, 'tok=rejects': 1, 'next_pos=JJ': 1, 'next_chunk=I-NP': 1, 'next_is_caps': 1, 'prev_chunk=I-NP': 1, 'prev_is_caps': 1, 'prev_pos=NNP': 1, 'chunk=I-VP': 1}, {'next_pos=NN': 1, 'is_caps': 1, 'next_chunk=I-NP': 1, 'tok=german': 1, 'chunk=I-NP': 1, 'prev_chunk=I-VP': 1, 'pos=JJ': 1, 'prev_pos=VBZ': 1, 'next_tok=call': 1, 'prev_tok=rejects': 1}
]
Regards, Sunny
Hi professor/TA, The result of all a3_test.py tests passed. When I run a3.py, I am getting different result from log.txt. Even the shape of training and testing were different My output: training data shape: (27858, 18724) testing data shape: (28028, 18724) Expected output: training data shape: (27858, 18287) testing data shape: (28028, 18287)
All the rest output also have rippled effect. I am suspecting something wrong in feature dictionary. Suppose we have the following data: [[('EU', 'NNP', 'I-NP', 'I-ORG'), ('rejects', 'VBZ', 'I-VP', 'O'), ('German', 'JJ', 'I-NP', 'I-MISC')] My code generate the following feature dictionary: [{'chunk=I-NP': 1, 'pos=NNP': 1, 'next_chunk=I-VP': 1, 'tok=eu': 1, 'next_tok=rejects': 1, 'is_caps': 1, 'next_pos=VBZ': 1}, {'pos=VBZ': 1, 'next_chunk=I-NP': 1, 'prev_tok=eu': 1, 'next_pos=JJ': 1, 'tok=rejects': 1, 'chunk=I-VP': 1, 'prev_chunk=I-NP': 1, 'next_tok=german': 1, 'prev_pos=NNP': 1}, {'chunk=I-NP': 1, 'next_tok=call': 1, 'prev_chunk=I-VP': 1, 'is_caps': 1, 'prev_tok=rejects': 1, 'next_chunk=I-NP': 1, 'tok=german': 1, 'next_pos=NN': 1, 'prev_pos=VBZ': 1, 'pos=JJ': 1}]
Is this correct? I have double check my code still, I am not able to figure out where I am going wrong. Any hints why I am getting different shape of data? I am ignoring "-DOCSTART- -X- -X- O" and newline from the train and test data.