Added Noun phrase count

ritikjain51 commented 4 years ago

Added Noun phrase counting. Solved the counting issue #14. Logic Updated with emoji decoding for robust noun phrase.

neomatrix369 commented 4 years ago

@ritikjain51 there are also acceptance tests and notebooks to update with your new change - the notebooks are under the notebook folder. Also, look for tests under the slow-tests folder. This is how you run the test coverage shell script:

cd [root of the project folder]
./test-coverage.sh "tests slow-tests"

This will generate tests run report and test coverage report.

neomatrix369 commented 4 years ago

@ritikjain51 please also add a bit of description explaining the background what you are doing, what is the outcome and what are you using to get the results, this will help me and others reading this PR.

As you can see from #15, you have opened up a potential, something I knew about but didn't give it a priority.

neomatrix369 commented 4 years ago

@ritikjain51 could you also please add a description to the PR as this will help me and others who look at it later.

You may have also missed out re-running the other notebook (nlp_profiler.ipynb), please take a look. You can skip the nlp_profiler-large-dataset.ipynb for now.

Thnx

neomatrix369 commented 4 years ago

@ritikjain51 Thanks for your patience and following through with the code review.

I will write-up the Developer guide soon so it can be a standard way to do things. There are also a couple of things to be automated which will also help.

neomatrix369 commented 4 years ago

@ritikjain51 your PR is out of sync with master, you may need to rebase with master

Are you using nbdime or another Jupyter/notebook version control plugin, it can help understand the differences?

ritikjain51 commented 4 years ago

There are 3 Cases are failing because I don't have datasets to validate those. Those 3 Cases are being used in test_apply_text_profiling.py

neomatrix369 commented 4 years ago

...

There are 3 Cases are failing because I don't have datasets to validate those. Those 3 Cases are being used in test_apply_text_profiling.py

The dataset is generated in the test itself, have a read of the tests that are failing, there is a method that creates the dataset.

These failures are normal if you look closely the Acceptance test is trying to compare old results with your new results (has one extra column) so it will fail. I suggest you generate the expected datasets and save them in the https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data.

It's easy to generate the new test data, but you do it only when you know the new dataset is correct in every form. For each of the test cases in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py save the results generated by the actual_dataframe using this line:

        def test_case() # for all the relevant test cases
                     ...
                     actual_dataframe.to_csv(csv_filename, index=False)  ### remove this line when you have finished generating the csv file
                     ...

Compare the old results with new using git diff and you can see the changes, in your case only your new column should change not any other aspect of the .csv files.

Let me know if you don't follow this step.

I hope to make this easier in future releases. So we can generate our tests data easily. But generating test data is always a manual task.

ritikjain51 commented 4 years ago

There are 3 Cases are failing because I don't have datasets to validate those. Those 3 Cases are being used in test_apply_text_profiling.py

The dataset is generated in the test itself, have a read of the tests that are failing, there is a method that creates the dataset.

In the test code, We are fetching data from expected data path location. Which is on available with me

neomatrix369 commented 4 years ago

@ritikjain51

Not true, the failing tests in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py are creating their own dataset and also using the data in the folder https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data. Please look again.

There is no external data used here.

ritikjain51 commented 4 years ago

@ritikjain51

Not true, the failing tests in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py are creating their own dataset and also using the data in the folder https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data. Please look again.

There is no external data used here.

![Uploading image.png…]()

ritikjain51 commented 4 years ago

@ritikjain51

Not true, the failing tests in https://github.com/neomatrix369/nlp_profiler/blob/master/tests/acceptance_tests/test_apply_text_profiling.py are creating their own dataset and also using the data in the folder https://github.com/neomatrix369/nlp_profiler/tree/master/tests/acceptance_tests/data. Please look again.

There is no external data used here.

neomatrix369 commented 4 years ago

@ritikjain51 could you also please add a description to the PR as this will help me and others who look at it later.

Any thoughts on this @ritikjain51

Sorry I have been delayed reviewing and merging this PR, currently working on #19 in order to automate the review process partially

ritikjain51 commented 4 years ago

@ritikjain51 could you also please add a description to the PR as this will help me and others who look at it later.

Any thoughts on this @ritikjain51

Sorry I have been delayed reviewing and merging this PR, currently working on #19 in order to automate the review process partially

Thanks, @neomatrix369 for your response. I'll add the PR description. And close this PR

neomatrix369 commented 4 years ago

Thanks, @neomatrix369 for your response. I'll add the PR description. And close this PR

You don't have to close the PR, just rebase it with master, and update your remote branch and see if the build passes and I'll review and merge your changes.

ritikjain51 commented 4 years ago

Thanks, @neomatrix369 for your response. I'll add the PR description. And close this PR

You don't have to close the PR, just rebase it with master, and update your remote branch and see if the build passes and I'll review and merge your changes.

Hey @neomatrix369, Could you please explain How & where to add description? Specifically.

ritikjain51 commented 4 years ago

Thanks, @neomatrix369 for your response. I'll add the PR description. And close this PR

You don't have to close the PR, just rebase it with master, and update your remote branch and see if the build passes and I'll review and merge your changes.

I have already verified and merged modified the merge conflicts.

neomatrix369 commented 4 years ago

Thanks, @neomatrix369 for your response. I'll add the PR description. And close this PR

You don't have to close the PR, just rebase it with master, and update your remote branch and see if the build passes and I'll review and merge your changes.

I have already verified and merged modified the merge conflicts.

Your current branch is not updated with latest commits from master after this PR has been merged into master https://github.com/neomatrix369/nlp_profiler/pull/19

Your local master is behind the remote master, you may need to rebase that too

neomatrix369 commented 4 years ago

@ritikjain51 I meant the Description here https://github.com/neomatrix369/nlp_profiler/pull/13#issue-498270839, here is an example PR https://github.com/neomatrix369/nlp_profiler/pull/17#issue-499312353 and even this one is a good example https://github.com/neomatrix369/nlp_profiler/pull/12#issue-497263904 on what to put in a description to the PR.

ritikjain51 commented 4 years ago

@ritikjain51 I meant the Description here #13 (comment), here is an example PR #17 (comment) and even this one is a good example #12 (comment) on what to put in a description to the PR.

Thank @neomatrix369 I have updated description.

neomatrix369 commented 4 years ago

You don't have to close the PR, just rebase it with master, and update your remote branch and see if the build passes and I'll review and merge your changes.

I have already verified and merged modified the merge conflicts.

Your current branch is not updated with latest commits from master after this PR has been merged into master #19

Your local master is behind the remote master, you may need to rebase that too

@ritikjain51 can you also please do this? It would help us both.

You might be missing this step https://github.com/neomatrix369/nlp_profiler/pull/18#issuecomment-709548651

neomatrix369 commented 4 years ago

@ritikjain51 so closing and reopening the PR helped, when you get a chance can you please see why the tests are failing - they may be passing on your local machine but failing on the CI/CD build systems

ritikjain51 commented 4 years ago

@ritikjain51 so closing and reopening the PR helped, when you get a chance can you please see why the tests are failing - they may be passing on your local machine but failing on the CI/CD build systems

Thanks @neomatrix369 for this information. CI/CD build failed because the code was written in python3.8 style.

ritikjain51 commented 4 years ago

@ritikjain51 so closing and reopening the PR helped, when you get a chance can you please see why the tests are failing - they may be passing on your local machine but failing on the CI/CD build systems

I have updated the code. Please solve the merge conflicts. As I don't have access to that.

neomatrix369 commented 4 years ago

@ritikjain51 so closing and reopening the PR helped, when you get a chance can you please see why the tests are failing - they may be passing on your local machine but failing on the CI/CD build systems

I have updated the code. Please solve the merge conflicts. As I don't have access to that.

@ritikjain51 these merge conflicts are happening on your branch (NOT master) - there is no merge conflict for me to solve. Git does not work in this way.

It says This branch has conflicts that must be resolved, click on the hyperlink that says "Use the command line" to learn more on how to resolve such issues.

Also whatever merges you applied since the last time the CI/CD ran on this branch has rendered the CI/CD disabled. Can you please rebase with master so you are up-to-date on your end.

I suggest please have a look Git and Github related tutorials and docs to find out how to do Rebase, solve merge conflicts and also understand how they work and why they happen - this is something that will always come back to us from time to time.

ritikjain51 commented 4 years ago

These Issues are there because u have made structural changes in your code.

ritikjain51 commented 4 years ago

Thanks @neomatrix369. for your guidance. I am closing this Pull Request. It's getting difficult to understand the issues. Hope you understand. Regards, Ritik Jain

neomatrix369 commented 4 years ago

@ritikjain51 the master branch did move on Friday as I was refactoring the code (implementation) - sorry about that but it had to be done for better readability and maintenance.

The merge conflicts were mainly due to changes in the data and notebook files.

It is normal in a project for code to change in branch to be out of step with master as development continues. I see you have also deleted your original changes, maybe someone could have helped you fix the changes.

PyCharm or VSCode have good git functionalities to help in the process - it's something out of scope of my project to help you with.

You can always create a new fork and then a new branch with the changes and regenerate the notebooks - it's up to you, otherwise myself or someone else might implement this functionality separately.

neomatrix369 commented 4 years ago

Your tests were failing on CI/CD due to the absence of this line:

nltk.download('averaged_perceptron_tagger')

in the noun_phase_count.py module (see https://github.com/neomatrix369/nlp_profiler/commit/08e54c42124b0c1d42caf53110ac51fc8f9561ee#diff-071796e721082b14e9eb7c22c771ca452fe1f83652dc4bcce6d1c599409a6c41R10)

The CI/CD logs did indicate that.

After fixing this line the tests pass:

tests/acceptance_tests/test_apply_text_profiling.py .....                                                                [  2%]
tests/granular/test_alphanumeric.py ........                                                                             [  7%]
tests/granular/test_chars_and_spaces.py ......                                                                           [ 10%]
tests/granular/test_dates.py .........                                                                                   [ 15%]
tests/granular/test_duplicates.py .........                                                                              [ 21%]
tests/granular/test_emojis.py ........                                                                                   [ 25%]
tests/granular/test_non_alphanumeric.py ........                                                                         [ 30%]
tests/granular/test_nounphase.py .........                                                                               [ 35%]
tests/granular/test_numbers.py ........                                                                                  [ 39%]
tests/granular/test_punctuations.py ........                                                                             [ 44%]
tests/granular/test_sentences.py ..................                                                                      [ 54%]
tests/granular/test_stop_words.py ........                                                                               [ 59%]
tests/granular/test_words.py ........                                                                                    [ 63%]
tests/high_level/test_grammar_check.py ..........                                                                        [ 69%]
tests/high_level/test_sentiment_polarity.py ................                                                             [ 78%]
tests/high_level/test_sentiment_subjectivity.py ................                                                         [ 87%]
tests/high_level/test_spelling_check.py ..................                                                               [ 97%]
slow-tests/acceptance_tests/test_apply_text_profiling.py .                                                               [ 98%]
slow-tests/performance_tests/test_perf_grammar_check.py .                                                                [ 98%]
slow-tests/performance_tests/test_perf_noun_phase.py .                                                                   [ 99%]
slow-tests/performance_tests/test_perf_spelling_check.py .                                                               [100%]

I would also suggest using the PR page i.e. https://github.com/neomatrix369/nlp_profiler/pull/13 to read and respond to messages, lots of information on this page helps with development process.

neomatrix369 commented 4 years ago

@ritikjain51 Since you put so much time and effort into this work I restored the branch for you see https://github.com/neomatrix369/nlp_profiler/tree/addNounPhraseCount - your commits are preserved, I added some fixes to it.

Also, please have a look at the CI/CD process https://github.com/neomatrix369/nlp_profiler/runs/1271145643?check_suite_focus=true

Please do a thorough check of it to see if anything has been missed out. Happy for you to create a new PR as the community values your hard work - let's coordinate this time and try to stay in sync with master.

Again if you are unfamiliar with something its best to say it and ask for help - only way for us to come forward and step in.

neomatrix369 / nlp_profiler

Added Noun phrase count #13