Closed abeepathak96 closed 7 years ago
Why is this file added to the non-breaking prefix?
That is the list of names of medicines in India which I want to add in nltk as a corpus so that drug named recognition can be performed
On Mon, Jun 5, 2017 at 2:27 AM, alvations notifications@github.com wrote:
Why is this file added to the non-breaking prefix?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nltk/nltk_data/pull/78#issuecomment-306066504, or mute the thread https://github.com/notifications/unsubscribe-auth/AU-EUcSQQFx1hFA0I0Qq98gOt7ZkV4hVks5sAxpSgaJpZM4NvZwa .
It should be added to another corpus and not not the nonbreaking_prefixes
. Also, there are other steps that needs to be done to like updating the indices for the all.xml
, corpora.xml
, etc.
If you're suggesting a new corpus, it should be an issue instead of a pull request (PR) =) https://github.com/nltk/nltk_data/issues
I see that there's already an issue at nltk_data#77 , perhaps someone else would pick it up and add it to the nltk_data
.
As noted in #77, we don't have capacity to add and maintain custom wordlists like this, sorry. They can be distributed outside of NLTK.
drugs.txt This is a text file that contains the names of medicines in India, which are prescribed in India by all the doctors this list is being constantly updated by our team. please add this list as a corpus in the nltk-data so that it can be used in the clinical data parsing.