rvidgen / clr

6 stars 5 forks source link

Fix BOM removal for OS X #1

Closed philippwinter closed 6 years ago

philippwinter commented 6 years ago

This correctly removes the BOM prefix on my system (OS X).

rvidgen commented 6 years ago

Hi Philipp Thanks for the code suggestions. If you are happy to share with us, we’d be really interested in what you are doing with the clR and your thoughts about improvement and further development. Regards Richard

From: Philipp Winter notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, 23 November 2017 at 01:46 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: [rvidgen/clr] Fix BOM removal for OS X (#1)

This correctly removes the BOM prefix on my system (OS X).


You can view, comment on, or merge this pull request online at:

https://github.com/rvidgen/clr/pull/1

Commit Summary

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/pull/1, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-LSeeVebHPDSpJCgHMJ74PzAZWZuks5s5M5pgaJpZM4QoMK0.

philippwinter commented 6 years ago

Hi Richard,

sure. I'm interested in performing a use case synthesis in the context of Business Intelligence & Analytic. Therefore, I'd like to cover the breadth of available literature, which is usually not feasible by human capacity. As noted in your paper, clR automates the traditionally necessary human interaction and especially forces one into creating a reproducible research methodology, which particularly interests me.

I've basically just started using your program yesterday, so my thoughts on improvements and further development are naturally limited. However, after executing a review on a current sample of Scopus' technology acceptance model data, I noticed that, unlike in your paper, my resulting word cloud's contents often lack a suffix (such as onlin[e] or eas[e]). Is this intentional?

Furthermore, I'd be definitely interested in automating the choosing of k to some degree. In a first step, integrating a loop that periodically increases k and saves the output to dedicated folders could be helpful as well.

Please let me know what you think about this.

Kind regards, Philipp

rvidgen commented 6 years ago

Hi Philipp

I saw you had some code changes on your forked version of clR, we will review these and incorporate – thanks.

In the new version of the clR we use word stemming – the old version we used for the TAM paper didn’t.

I’ve been using the stm package recently as it allows metadata to be added, e.g., year of publication. From the stm output we make a Word doc using R markup (see attached excerpt for an example) and that really helps make sense of the topics. The topic correlation matrix is very useful for clustering topics (I have a draft paper on this if you are interested).

I have used the ldatuning package to search for K value: https://cran.r-project.org/web/packages/ldatuning/vignettes/topics.html Have also tried perplexity and log likelihood with cross-validation (e.g., 5-fold): http://ellisp.github.io/blog/2017/01/05/topic-model-cv As you know, finding K is by trial and error and inspection of results.

Once you have settled on K then it is worth tuning alpha and beta, again using cross-validation.

I’ve got a scruffy script that does the above (K, alpha, beta), you are welcome to a copy (it’s very much WIP) but you can probably find examples on the web just as easily.

Regards Richard

p.s. what is your email address?

From: Philipp Winter notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, 23 November 2017 at 10:22 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Fix BOM removal for OS X (#1)

Hi Richard,

sure. I'm interested in performing a use case synthesis in the context of Business Intelligence & Analytic. Therefore, I'd like to cover the breadth of available literature, which is usually not feasible by human capacity. As noted in your paper, clR automates the traditionally necessary human interaction and especially forces one into creating a reproducible research methodology, which particularly interests me.

I've basically just started using your program yesterday, so my thoughts on improvements and further development are naturally limited. However, after executing a review on a current sample of Scopus' technology acceptance model data, I noticed that, unlike in your paper, my resulting word cloud's contents often lack a suffix (such as onlin[e] or eas[e]). Is this intentional?

Furthermore, I'd be definitely interested in automating the choosing of k to some degree. In a first step, integrating a loop that periodically increases k and saves the output to dedicated folders could be helpful as well.

Please let me know what you think about this.

Kind regards, Philipp

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/pull/1#issuecomment-346579562, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-LhSeDujsn5cUOv0FPIMyFgvzOiFks5s5UdtgaJpZM4QoMK0.