quanteda / quanteda.textstats

Textual statistics for quanteda
GNU General Public License v3.0
14 stars 2 forks source link

Fix parallel code for quanteda 4.0 #72

Closed koheiw closed 6 months ago

koheiw commented 6 months ago

Update code for quanteda v4.0.

kbenoit commented 6 months ago

Any idea what's causing the segfaults from running this on Ubuntu? Is that because it needs quanteda 4.0?

> dfmat <- corpus_subset(data_corpus_inaugural, Year > 2000) %>%
+     tokens(remove_punct = TRUE) %>%
+     tokens_remove(stopwords("english")) %>%
+     dfm()
> (tstat1 <- textstat_simil(dfmat, method = "cosine", margin = "documents"))
 *** caught segfault ***
koheiw commented 6 months ago

It is probably because proxyC still uses RcppParallel. I have a branch to remove the dependency.

On Mon, Apr 1, 2024, 21:03 Kenneth Benoit @.***> wrote:

Any idea what's causing the segfaults from running this on Ubuntu? Is that because it needs quanteda 4.0?

dfmat <- corpus_subset(data_corpus_inaugural, Year > 2000) %>%

  • tokens(remove_punct = TRUE) %>%
  • tokens_remove(stopwords("english")) %>%
  • dfm() (tstat1 <- textstat_simil(dfmat, method = "cosine", margin = "documents")) caught segfault

— Reply to this email directly, view it on GitHub https://github.com/quanteda/quanteda.textstats/pull/72#issuecomment-2029653425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSEXIZ3MMSKCM7PJJHZS2TY3FEJLAVCNFSM6AAAAABFOMCO7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRZGY2TGNBSGU . You are receiving this because you authored the thread.Message ID: @.***>

kbenoit commented 6 months ago

Ok, sounds like then

Resubmit proxyc Resubmit stats and models Then, quanteda

All this because rcpparallel won’t maintain the tbb code consistently!

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Kohei Watanabe @.> Sent: Monday, April 1, 2024 2:09:03 PM To: quanteda/quanteda.textstats @.> Cc: Benoit,KR @.>; Review requested @.> Subject: Re: [quanteda/quanteda.textstats] Fix parallel code for quanteda 4.0 (PR #72)

It is probably because proxyC still uses RcppParallel. I have a branch to remove the dependency.

On Mon, Apr 1, 2024, 21:03 Kenneth Benoit @.***> wrote:

Any idea what's causing the segfaults from running this on Ubuntu? Is that because it needs quanteda 4.0?

dfmat <- corpus_subset(data_corpus_inaugural, Year > 2000) %>%

  • tokens(remove_punct = TRUE) %>%
  • tokens_remove(stopwords("english")) %>%
  • dfm() (tstat1 <- textstat_simil(dfmat, method = "cosine", margin = "documents")) caught segfault

— Reply to this email directly, view it on GitHub https://github.com/quanteda/quanteda.textstats/pull/72#issuecomment-2029653425, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSEXIZ3MMSKCM7PJJHZS2TY3FEJLAVCNFSM6AAAAABFOMCO7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRZGY2TGNBSGU . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/quanteda/quanteda.textstats/pull/72#issuecomment-2029733929, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQUYZVUTTBERTI2FTJBY3LY3FL67AVCNFSM6AAAAABFOMCO7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRZG4ZTGOJSHE. You are receiving this because your review was requested.Message ID: @.***>