quanteda / quanteda.textplots

Plotting and visualisation for quanteda
GNU General Public License v3.0
6 stars 1 forks source link

topfeatures #25

Closed ampierc2 closed 1 month ago

ampierc2 commented 1 month ago

Hi - I am trying to use the function topfeatures(). It works for the dfm() output, but not for the fcm() output. It thinks I am using the deprecated topfeatures.fcm() function and showing the errror:

Error: ! topfeatures.fcm() was deprecated in quanteda 4.0 and is now defunct.


This works

dfm_matrix <- dfm(toks, tolower = TRUE, remove_padding = FALSE)

topfeatures(dfm_matrix)

This does not work:

fcm_matrix <- fcm(toks, context ="window", window = 5) topfeatures(fcm_matrix )

output:

Error: ! topfeatures.fcm() was deprecated in quanteda 4.0 and is now defunct.

I solved this by using the featfreq() function, making it a dataframe, sorting by frequency and then pulling the top 50 words as the "top features" but wondering if this function should also work for fcm?

I have tried this with different versions of Rstudio (4.1.1, 4.2.2 and 4.3)

kbenoit commented 1 month ago

We removed topfeatures.fcm() in 4.0 for precisely this reason: Unlike for a dfm, there is not a clearly defined "dimension" for feature frequency. Consider your example, below. The feature frequencies for the transposed fcm are different. But using the code below, you can decide which one you want to consider. Just keep in mind that the diagonal for an fcm has a specific meaning that may differ from what you want.

library("quanteda")
#> Package version: 4.0.2
#> Unicode version: 14.0
#> ICU version: 71.1
#> Parallel computing: disabled
#> See https://quanteda.io for tutorials and examples.

toks <- tokens(data_corpus_inaugural[1:5])
dfm_matrix <- dfm(toks, tolower = TRUE,
                  remove_padding = FALSE)

topfeatures(dfm_matrix)
#>   the     ,    of   and    to    in     .     a which  that 
#>   565   546   427   354   269   140   138   106   105   102

fcm_matrix <- fcm(toks, context = "window", window = 5)
topfeatures(fcm_matrix )
#> Error:
#> ! `topfeatures.fcm()` was deprecated in quanteda 4.0 and is now defunct.

featfreq(fcm_matrix) |>
    sort(decreasing = TRUE) |>
    head()
#>    ,  the  and   in   to   be 
#> 1669  719  506  486  431  413

featfreq(t(fcm_matrix)) |>
    sort(decreasing = TRUE) |>
    head()
#>  the   of    ,  and   to    . 
#> 5071 4260 4151 3054 2301  994

Created on 2024-05-22 with reprex v2.1.0

ampierc2 commented 1 month ago

Hi Kenneth, First of all, you are awesome. This is a really cool package.

Second thank you for your prompt response, be right back....googling "diagonal for an fcm"

Thank you!

Alex Feldmeyer @.*** | (937)-689-3802


From: Kenneth Benoit @.> Sent: Wednesday, May 22, 2024 6:50 PM To: quanteda/quanteda.textplots @.> Cc: ampierc2 @.>; Author @.> Subject: Re: [quanteda/quanteda.textplots] topfeatures (Issue #25)

We removed topfeatures.fcm() in 4.0 for precisely this reason: Unlike for a dfm, there is not a clearly defined "dimension" for feature frequency. Consider your example, below. The feature frequencies for the transposed fcm are different. But using the code below, you can decide which one you want to consider. Just keep in mind that the diagonal for an fcm has a specific meaning that may differ from what you want.

library("quanteda")

> Package version: 4.0.2

> Unicode version: 14.0

> ICU version: 71.1

> Parallel computing: disabled

> See https://quanteda.io for tutorials and examples.

toks <- tokens(data_corpus_inaugural[1:5]) dfm_matrix <- dfm(toks, tolower = TRUE, remove_padding = FALSE)

topfeatures(dfm_matrix)

> the , of and to in . a which that

> 565 546 427 354 269 140 138 106 105 102

fcm_matrix <- fcm(toks, context = "window", window = 5) topfeatures(fcm_matrix )

> Error:

> ! topfeatures.fcm() was deprecated in quanteda 4.0 and is now defunct.

featfreq(fcm_matrix) |> sort(decreasing = TRUE) |> head()

> , the and in to be

> 1669 719 506 486 431 413

featfreq(t(fcm_matrix)) |> sort(decreasing = TRUE) |> head()

> the of , and to .

> 5071 4260 4151 3054 2301 994

Created on 2024-05-22 with reprex v2.1.0https://reprex.tidyverse.org/

— Reply to this email directly, view it on GitHubhttps://github.com/quanteda/quanteda.textplots/issues/25#issuecomment-2125907833, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK2NNZTL5JENKS4M7I4PW2LZDUOKFAVCNFSM6AAAAABIDSYXUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRVHEYDOOBTGM. You are receiving this because you authored the thread.Message ID: @.***>