Open rajeshr6r opened 4 years ago
Hi @rajeshr6r, thank you for opening an issue.
can you tell us a bit more about how you expect to integrate flashtext
into Texthero and how would you use the new functions?
Also, would you be interested in helping out integrating these features?
regards,
Hello Jonathon,
Thanks for your email .
flashtext is a python library that I used extensively in some of my projects to deal with identifying key-words . Essentially flashtext serves as a blazing fast alternative for regex .
My idea and expectation is that a user should be able to quickly get the following from the data that are trying to analyze .
1. Check the existence of keywords present in the text based on the dictionary that they supply and extract them. 2. Normalize the data by replacing words with defined keywords such as the below example
NY , N.Y , NewYork , BigApple , NewYork city all refer to NewYork . So one should be able to simply define a keyword dictionary and normalize all the occurrences into standardized form .
I'd be happy to help integrate this feature and would like to have some guidance and how and where to start and what are the guidelines I should follow .
Meanwhile sharing some links about flashtext .
https://github.com/vi3k6i5/flashtext
I guess this will help give a better view of why flash text will add value to the pre-processing tool-kit of texthero .
Thanks ,
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Wed, Jul 15, 2020 at 1:27 PM Jonathan Besomi notifications@github.com wrote:
Hi @rajeshr6r https://github.com/rajeshr6r, thank you for opening an issue.
can you tell us a bit more about how you expect to integrate flashtext into Texthero and how would you use the new functions?
Also, would you be interested in helping out integrating these features?
regards,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-658609994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJMBHADLODFA5FPFCXXRK3R3VONHANCNFSM4O2FTT2Q .
Hello Jonathan,
Meanwhile I also wrote a basic introduction to texthero on Medium . Thought of sharing it with you .
https://medium.com/dev-genius/texthero-an-un-official-introduction-and-assessment-f7fd0b8c290e
Thanks,
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Wed, Jul 15, 2020 at 6:12 PM Rajesh Rajamani rajesh.r6r@gmail.com wrote:
Hello Jonathon,
Thanks for your email .
flashtext is a python library that I used extensively in some of my projects to deal with identifying key-words . Essentially flashtext serves as a blazing fast alternative for regex .
My idea and expectation is that a user should be able to quickly get the following from the data that are trying to analyze .
1. Check the existence of keywords present in the text based on the dictionary that they supply and extract them. 2. Normalize the data by replacing words with defined keywords such as the below example
NY , N.Y , NewYork , BigApple , NewYork city all refer to NewYork . So one should be able to simply define a keyword dictionary and normalize all the occurrences into standardized form .
I'd be happy to help integrate this feature and would like to have some guidance and how and where to start and what are the guidelines I should follow .
Meanwhile sharing some links about flashtext .
https://github.com/vi3k6i5/flashtext
I guess this will help give a better view of why flash text will add value to the pre-processing tool-kit of texthero .
Thanks ,
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Wed, Jul 15, 2020 at 1:27 PM Jonathan Besomi notifications@github.com wrote:
Hi @rajeshr6r https://github.com/rajeshr6r, thank you for opening an issue.
can you tell us a bit more about how you expect to integrate flashtext into Texthero and how would you use the new functions?
Also, would you be interested in helping out integrating these features?
regards,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-658609994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJMBHADLODFA5FPFCXXRK3R3VONHANCNFSM4O2FTT2Q .
Hey Rajesh,
thank you for your message! For you to know, I haven't sent you an email, rather answered the Github issue you opened recently.
Congrats for the Medium article 👌
Thank you for mentioning and showing us FlashText. The FlashText core-algorithm is very fascinating and extremely useful; I'm sure we will be able to exploit it on some Texthero's function to make them more performant. If you have already experience with that, your aid will be much appreciated!
If you want to start collaborating, I encourage you to read the CONTRIBUTING.md as well as the PURPOSE.md documents.
On the Github issue page then, you will see a lot of open issues you can contribute to. Also, one of the missing pieces right now of Texthero is that the documentation is incomplete. Would you like to help us in improving it? I think you might like this position as you already wrote a Medium article ...
regards,
Hi Jonathan,
Apologies . Just realized it . Sure . Will read the documentation and would love to improve the documentation part .
Thanks ,
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Wed, Jul 15, 2020 at 6:52 PM Jonathan Besomi notifications@github.com wrote:
Hey Rajesh,
thank you for your message! For you to know, I haven't sent you an email, rather answered the Github issue https://github.com/jbesomi/texthero/issues/87 you opened recently.
Congrats for the Medium article 👌
Thank you for mentioning and showing us FlashText. The FlashText core-algorithm is very fascinating and extremely useful; I'm sure we will be able to exploit it on some Texthero's function to make them more performant. If you have already experience with that, your aid will be much appreciated!
If you want to start collaborating, I encourage you to read the CONTRIBUTING.md https://github.com/jbesomi/texthero/blob/master/CONTRIBUTING.md as well as the PURPOSE.md https://github.com/jbesomi/texthero/blob/master/PURPOSE.md documents.
On the Github issue https://github.com/jbesomi/texthero/issues page then, you will see a lot of open issues you can contribute to. Also, one of the missing pieces right now of Texthero is that the documentation is incomplete. Would you like to help us in improving it? I think you might like this position as you already wrote a Medium article ...
regards,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-658763965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJMBHFFZR2EHEPXKIHRZCTR3WUPZANCNFSM4O2FTT2Q .
Hello Jonathan,
Unfortunately I couldn't figure out how to start with the documentation .
By Documentation should I go about writing documentation for the various functions or should I write tutorial type scripts .
Is there a sample that I can look at and take guidance on?
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Wed, Jul 15, 2020 at 6:52 PM Jonathan Besomi notifications@github.com wrote:
Hey Rajesh,
thank you for your message! For you to know, I haven't sent you an email, rather answered the Github issue https://github.com/jbesomi/texthero/issues/87 you opened recently.
Congrats for the Medium article 👌
Thank you for mentioning and showing us FlashText. The FlashText core-algorithm is very fascinating and extremely useful; I'm sure we will be able to exploit it on some Texthero's function to make them more performant. If you have already experience with that, your aid will be much appreciated!
If you want to start collaborating, I encourage you to read the CONTRIBUTING.md https://github.com/jbesomi/texthero/blob/master/CONTRIBUTING.md as well as the PURPOSE.md https://github.com/jbesomi/texthero/blob/master/PURPOSE.md documents.
On the Github issue https://github.com/jbesomi/texthero/issues page then, you will see a lot of open issues you can contribute to. Also, one of the missing pieces right now of Texthero is that the documentation is incomplete. Would you like to help us in improving it? I think you might like this position as you already wrote a Medium article ...
regards,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-658763965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJMBHFFZR2EHEPXKIHRZCTR3WUPZANCNFSM4O2FTT2Q .
Hey @rajeshr6r, thank you for reaching out!
Q: do you want to start by writing a how-to tutorial or by improving the docstring?
In the first case, you will have to come up with an article blog and add it under website/blog
, in the second case, you will need to improve the docstrings from any of texthero
function files.
As soon as I will have more information regarding your goals, I will be able to guide you thoroughly.
Regards,
Jonathan,
Thanks so much for the guidance .
I'll start with the how-to tutorials and follow through on the docstring later .
As a side step , will work on the flash-text integration.
Thanks,
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Fri, Jul 24, 2020 at 4:04 PM Jonathan Besomi notifications@github.com wrote:
Hey @rajeshr6r https://github.com/rajeshr6r, thank you for reaching out!
Q: do you want to start by writing a how-to tutorial or by improving the docstring?
In the first case, you will have to come up with an article blog and add it under website/blog, in the second case, you will need to improve the docstrings from any of texthero function files.
As soon as I will have more information regarding your goals, I will be able to guide you thoroughly.
Regards,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-663475598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJMBHBLTRSO6K76JTDR5GTR5FPSJANCNFSM4O2FTT2Q .
That's a sounding plan.
For the how-to tutorials, do you already have an idea on the subject?
Regards,
Il giorno ven 24 lug 2020 alle ore 13:21 Rajesh Rajamani < notifications@github.com> ha scritto:
Jonathan,
Thanks so much for the guidance .
I'll start with the how-to tutorials and follow through on the docstring later .
As a side step , will work on the flash-text integration.
Thanks,
Rajesh R
India : + 91 9819-937-639 rajesh.r6r@gmail.com rajesh.r6r@gmail.com
On Fri, Jul 24, 2020 at 4:04 PM Jonathan Besomi notifications@github.com wrote:
Hey @rajeshr6r https://github.com/rajeshr6r, thank you for reaching out!
Q: do you want to start by writing a how-to tutorial or by improving the docstring?
In the first case, you will have to come up with an article blog and add it under website/blog, in the second case, you will need to improve the docstrings from any of texthero function files.
As soon as I will have more information regarding your goals, I will be able to guide you thoroughly.
Regards,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-663475598, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AEJMBHBLTRSO6K76JTDR5GTR5FPSJANCNFSM4O2FTT2Q
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jbesomi/texthero/issues/87#issuecomment-663491709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKJ3YOMF5DZJSK6VP6MUFUDR5FVC7ANCNFSM4O2FTT2Q .
-- Jonathan Besomi - jonathan.besomi@gmail.com
Hi,
Thought of sharing an update here . I have completed an initial function using flashtext that does the following .
Input : text column ( pd.series ) , keyword_dictionary ( dict ) Output : text column ( pd.series ) containing the keywords extracted from the text .
Example Dictionary :
keywords_cleaner={"Atheletes of Australia":["aussies","australians"],
"Cricket":["Twenty20","Test match","ODI","Cricket","14-man squad"],
"Tennis Tournaments":["Davis Cup","Aussie Open","Austrlian Open"]
}
Example Dataset : BBC Sports article data set .
Output : Articles grouped by the keyword summarized .
Some interesting outcomes in the dataset due to the ambiguity in the keyword dictionary definition which anyways is in the hands of the end-user.
Would complete the tests and request for a push and merge .
Thanks,
Hello there ,
Super useful and Superquick . I thought of requesting the addition of flashtext package for it has some great use in extracting / replacing keywords that go a long way in pre-processing steps .
https://github.com/vi3k6i5/flashtext