Closed ampierc2 closed 1 month ago
We removed topfeatures.fcm()
in 4.0 for precisely this reason: Unlike for a dfm, there is not a clearly defined "dimension" for feature frequency. Consider your example, below. The feature frequencies for the transposed fcm are different. But using the code below, you can decide which one you want to consider. Just keep in mind that the diagonal for an fcm has a specific meaning that may differ from what you want.
library("quanteda")
#> Package version: 4.0.2
#> Unicode version: 14.0
#> ICU version: 71.1
#> Parallel computing: disabled
#> See https://quanteda.io for tutorials and examples.
toks <- tokens(data_corpus_inaugural[1:5])
dfm_matrix <- dfm(toks, tolower = TRUE,
remove_padding = FALSE)
topfeatures(dfm_matrix)
#> the , of and to in . a which that
#> 565 546 427 354 269 140 138 106 105 102
fcm_matrix <- fcm(toks, context = "window", window = 5)
topfeatures(fcm_matrix )
#> Error:
#> ! `topfeatures.fcm()` was deprecated in quanteda 4.0 and is now defunct.
featfreq(fcm_matrix) |>
sort(decreasing = TRUE) |>
head()
#> , the and in to be
#> 1669 719 506 486 431 413
featfreq(t(fcm_matrix)) |>
sort(decreasing = TRUE) |>
head()
#> the of , and to .
#> 5071 4260 4151 3054 2301 994
Created on 2024-05-22 with reprex v2.1.0
Hi Kenneth, First of all, you are awesome. This is a really cool package.
Second thank you for your prompt response, be right back....googling "diagonal for an fcm"
Thank you!
Alex Feldmeyer @.*** | (937)-689-3802
From: Kenneth Benoit @.> Sent: Wednesday, May 22, 2024 6:50 PM To: quanteda/quanteda.textplots @.> Cc: ampierc2 @.>; Author @.> Subject: Re: [quanteda/quanteda.textplots] topfeatures (Issue #25)
We removed topfeatures.fcm() in 4.0 for precisely this reason: Unlike for a dfm, there is not a clearly defined "dimension" for feature frequency. Consider your example, below. The feature frequencies for the transposed fcm are different. But using the code below, you can decide which one you want to consider. Just keep in mind that the diagonal for an fcm has a specific meaning that may differ from what you want.
library("quanteda")
toks <- tokens(data_corpus_inaugural[1:5]) dfm_matrix <- dfm(toks, tolower = TRUE, remove_padding = FALSE)
topfeatures(dfm_matrix)
fcm_matrix <- fcm(toks, context = "window", window = 5) topfeatures(fcm_matrix )
topfeatures.fcm()
was deprecated in quanteda 4.0 and is now defunct.featfreq(fcm_matrix) |> sort(decreasing = TRUE) |> head()
featfreq(t(fcm_matrix)) |> sort(decreasing = TRUE) |> head()
Created on 2024-05-22 with reprex v2.1.0https://reprex.tidyverse.org/
— Reply to this email directly, view it on GitHubhttps://github.com/quanteda/quanteda.textplots/issues/25#issuecomment-2125907833, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK2NNZTL5JENKS4M7I4PW2LZDUOKFAVCNFSM6AAAAABIDSYXUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVHEYDOOBTGM. You are receiving this because you authored the thread.Message ID: @.***>
Hi - I am trying to use the function topfeatures(). It works for the dfm() output, but not for the fcm() output. It thinks I am using the deprecated topfeatures.fcm() function and showing the errror:
Error: !
topfeatures.fcm()
was deprecated in quanteda 4.0 and is now defunct.This works
dfm_matrix <- dfm(toks, tolower = TRUE, remove_padding = FALSE)
topfeatures(dfm_matrix)
This does not work:
fcm_matrix <- fcm(toks, context ="window", window = 5) topfeatures(fcm_matrix )
output:
Error: !
topfeatures.fcm()
was deprecated in quanteda 4.0 and is now defunct.I solved this by using the featfreq() function, making it a dataframe, sorting by frequency and then pulling the top 50 words as the "top features" but wondering if this function should also work for fcm?
I have tried this with different versions of Rstudio (4.1.1, 4.2.2 and 4.3)