Open Utopiah opened 2 years ago
hey Fabien, you're talking about the results of the wikipedia plugin right?
Yeah, super noisy. it really needs a lot of work. Yeah, i was using a stop-list here but that was just me eyeballing it. It could really use a PR, if you want to take a swing at it.
To do it properly, we should also add (some!) wikipedia redirects. I held-off because the results were still so rowdy. cheers
Hi, I'm just learning about the project and it's pretty amazing. I tinkered with NTLK and Gensim before but this is so convenient to explore and embed on a page. Learning with Observable notebooks is also great!
That being said I end up for a lot of noise in my selection. I tried a bit of
normalize()
andremove()
with encouraging results. Still, I'm quite surprised that when I search in this repository I don't seem to find stop words.This made me wonder, is this the "wrong" way in this context? Is the philosophy of compromise not to rely on such lists?
PS: I apologize for hijacking issues but is there a forum/chat/platform for discussions on using compromise that would a better place? I have other questions like using
.tfidf()
on.ngrams()
but I don't make to create noise here.