ualbertalib / can-link

Front end react app for CanLink project
1 stars 0 forks source link

Wordcloud #25

Open CarlsoFiorention opened 4 years ago

CarlsoFiorention commented 4 years ago
03 wordcloud
sfarnel commented 4 years ago

Do not add Supervisor at this stage as we are determining feasibility based on the data. This may be for future work.

@danydvd can you please investigate the process for filtering terms (e.g., removing stop words)?

jchartrand commented 4 years ago

James estimate:

Not sure what is meant here by 'activate results':

"make the items responsive: when click on a word activate results from that category"

An example would be good like, "When clicking 'alberta' in the word cloud that should invoke a brand new query that replaces whatever term is in the given 'category', e.g, subject, with the new term and otherwise run the same query as had just been run)

Other word cloud changes depend on what the word cloud software allows, however, estimate (aside from removing stop words, which Danoosh would do in SOLR): 2-5 days.

sfarnel commented 4 years ago

@CarlsoFiorention can you briefly further clarify for James?

CarlsoFiorention commented 4 years ago

This is also illustrated in one of the scenarios. By "responsive" it means that words within the wordcloud can be clicked by the user, and show the list of all the documents available in the collection under that category (e.g. "biology"). This may result in invoking a new query funneling down to small number of results (e.g. select "institution" from the top menu, may show a word cloud of all institutions that include "biology" and the list of results will get shorter when you click in one word (e.g. U of A)

danydvd commented 4 years ago

@sfarnel @jchartrand I can try and implement SOLR's stopwords_ca.txt and reindex the entire data but I don't think that will solve our problems here as we have many multi word subjects (e.g. personality and academic achievement) and removing stopwords (e.g and ) might make the subject seems weird. I think the world cloud right now is tokenizing the subjects into single words!

sfarnel commented 4 years ago

Thanks @danydvd It seems this needs more investigation before we try something. Would you be able to dig into this a bit more? (if @jchartrand can point you to the library he's using; wordcloud.js?)

jchartrand commented 4 years ago

I’m using https://www.npmjs.com/package/react-wordcloud

I do see what Danoosh means about the tokenization. There may well be a setting to stop react-wordcloud from doing that.

And quickly looking at the npm page, the documentation does say that it can handle stop words, although as Danoosh points out that might make the subjects (that rely on the stop words for meaning) confusing.

On Aug 24, 2020, at 4:20 PM, Sharon Farnel notifications@github.com wrote:

Thanks @danydvd https://github.com/danydvd It seems this needs more investigation before we try something. Would you be able to dig into this a bit more? (if @jchartrand https://github.com/jchartrand can point you to the library he's using; wordcloud.js?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jchartrand/can-link/issues/25#issuecomment-679345175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFSXOTYYUEKXCUAJ6B2WTSCLDSHANCNFSM4QHVVUYQ.

sfarnel commented 4 years ago

Thanks James. @danydvd if you can poke around the package James mentions and see if you can find anything promising that would be great; thanks!

danydvd commented 4 years ago

@jchartrand I created too more SOLR cores (CanLink and CanLink-1 and re-indexed the data (same data source as before) with different tokenizer options. I was not sure how to change the react part (and did not want to make a mess). can we try the wordcloud with these?

jchartrand commented 4 years ago

Yes, will do. Thanks Danoosh.

Sent from my iPhone

On Aug 25, 2020, at 4:03 PM, Danoosh Davoodi notifications@github.com wrote:

 @jchartrand I created too more SOLR cores (CanLink and CanLink-1 and re-indexed the data (same data source as before) with different tokenizer options. I was not sure how to change the react part (and did not want to make a mess). can we try the wordcloud with these?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.