Open Peter96K opened 1 month ago
This is a joint task that will include collaboration with @Peter96K .
Implement evaluation logic that will terminate the build process of the object if the object's sentence isn't cefr_level compliant.
https://www.examenglish.com/CEFR/cefr_grammar.htm This is a link to a cefr grammar chart. we could possibly use this to create our own grammar suitability analysis if we can not find an appropriate program.
Note : handling cases where more than 1 correct answer applies e.g. 'a ball and a hat / a hat and a ball' either on the level of the TextAnalyzer() or kedtechLa playground
Note : handling cases where more than 1 correct answer applies e.g. 'a ball and a hat / a hat and a ball' either on the level of the TextAnalyzer() or kedtechLa playground
Unless we can make it so there 2 correct answers in those sentences, we could maybe do it where and is never by itself, which would then force only one correct sentence.
I had a look at CEFRpy's modules, and in my (very untrained) reading of it, it doesn't seem to deal with sentence level stuff. It does have some parts-of-speech identification which could be useful.
I was reading about NLTK and they have a sentence parsing module which might be useful, but I'm not sure. I will atatch it for you to look at. https://www.nltk.org/api/nltk.parse.html#module-nltk.parse
reviewed nltk main page with @CPTNFreedom , team will make a decision on the upcoming 1st of October post-mortem on the path forward as well as courses to purchase.
extended TextAnalyzer class using spacy as proof of concept, instantiated eng model, tokenized sentence and retrieved pos attribute from token object
next step : conclude this task within the current sprint by identifying the most appropriate spaCy methods and concepts. Once finalized transfer this task into the mc project for further development.
This is using token.text, token.pos, and token.dep all at the same time. it presents the info relatively clearly and I think there's definitely something we can do with all of these. Dependency features may or may not be useful now, but I think it could be important for some grammatical parameters later.
Added the use of token.morph. Morph is much more complicated and will take some time to before being able to read it fluently. But it gives us much more linguistic information about each word's role in the sentence. Probably of some use for grammatical/syntactic rules
This might be a better example of what .morph can do for us. Clearly shows tense and aspect of the sentence. Interestingly, isn't able to identify future tense as it is looking at each word individually (and future tense doesn't exist on a lexical level in English). But it able to identify aux verbs and participles, so I think we should be able to identify gramm. structures quite easily with it.
Sounds like you're on the right track, these findings will be useful for the completion of this GitHub Issue.
9 - simple script -> input sentence, output identifying a specific Text.morph item;
So I was able to do some coding with chatgpt and had some interesting results. This first one is an example of spacy detecting present perfect. I had to simplify this a little as chatgpt was keen to use child tokens and grandchild tokens (dependency tokens), which whilst it sounded great was not able to identify PP well. this simpler version seems a bit more reliable.
the same can be said for when I tried to create future perfect continuous. some of the code was really long (and pretty confusing) but also would not effectively detect FPC.
In the end, I was able to simplify the code a lot more and ended with this: Which is I guess a little less perfect than it could be, but could be executed quicker and actually detects the future-perfect=-continuous.
Great job it's a good proof of concept. Next steps will be covering all cases when it is in fact the desired tense and we can extend TextAnalyzer to manually check all grammar. If you have enough time left this week I would suggest working on that , simply adding another if block e.g. if (doc[i].lemma_ == 'will not').. etc. what you can do is wrap the if block in a function called def check_affirmative(doc) which will accept 'doc' as its argument and you can hide the process of checking affirmative, negative, etc. but that's just a suggestion, not necessarily the only / best way to implement this. Let's further discuss on Monday, if you add any updates, I'll take a look.
I tried to do a really simple one today but ran into some interesting issues. I tried to get a function to identify the past simple tense, which sound like the most straight forward thing with spaCy. However, it quickly became evident that we needed a way to differentiate past simple from perfect aspect. Chatgpt and I tried some different ways, from finding auxiliary verbs in the sentence, to trying to find "have" or "had" specifically. I'm a little surprised none of them worked, but they all returned 'past simple tense found'. Here are some of the examples below:
Maybe one of these just needs a simple tweak to get the right solution. I will try to work on identifying past/present perfect next week. Maybe if I can get that function the solution to past simple will be obvious.
Today I was able to make 3 functions for A1 level. first I made a function to detect both comparatives ad superlatives. these are often taught next to each other in the syllabi and the code for them is very similar, so I've combined them into one function, but can easily be separated if need be. here it is:
similarly, I've combined possessive adjectives and the possessive s into one function here:
finally, a function to detect sentences like "I like playing football" with the verb + verb(ing):
Extend the text analysis capabilities to include syntax analysis. Consider different approaches and focus on the one adding the most value to the project.