# Frequency analysis
Python package for symbol/word and their bigrams frequency analysis with excel output.
What values can be counted: quantity, quantity in the first position, quantity in the last position, average position.
For which data values can be counted: symbols, symbol bigrams, words, word bigrams.
Additional possible data: ye-yo words table for Russian language (in excel output it can be cross-referenced with words quantity).
pip install frequency-analysis
;Analysis
class with context manager (take a look at the optional arguments);Analysis
;Result
class with context manager (with optional name
argument);Result
methods to create excel sheet(s) with appropriate data.All arguments are optional
frequency_analysis
n
'[a-zA-Zа-яА-ЯёЁ]+(?:(?:-?[a-zA-Zа-яА-ЯёЁ]+)+|\'?[a-zA-Zа-яА-ЯёЁ]+)|[a-zA-Zа-яА-ЯёЁ]'
[*range(32, 127), 1025, *range(1040, 1104), 1105]
(base punctuation, base Latin, Russian Cyrillic)yo.txt
for words with mandatory yo
and ye-yo.txt
for possibly yo
writing). You can use your own or take it here.
0
Method for counting symbol and symbol_bigram frequency.
Counted values: quantity, quantity in the first position, quantity in the last position, average position in word.
Average position counted only with argument pos
as True
(default False
).
Position for symbols, which matched with word_pattern
counted as for "clear" word, for other – as for "raw".
Example: in single word "–Yes!" with default word_pattern
positions will be counted as (– 1), (Y 1), (e 2), (s 3), (! 5).
Bigrams counting can be disabled with argument bigram
as False
(default True
).
Method for counting word and word_bigrams frequency.
Counted values: quantity, quantity in the first position, quantity in the last position, average position in sentence.
Average position counted only with argument pos
as True
(default False
).
Bigrams counting can be disabled with argument bigram
as False
(default True
).
Combined call of previous two methods.
The only argument is optional
frequency_analysis
First 6 methods can be called all it once with treat() method
Many methods accept arguments limit
, chart_limit
, min_quantity
and ignore_case
.
0
) it is a max number of elements, which will be added to the sheet. Zero – unlimited;20
) – a number of elements, which will be counted with graphical chart;1
) – a minimal appropriate value at with element will be added to the sheet;False
) – with this argument as True
lower- and upper- case symbols will be united into a single element. With False
– will be counted separately. Keyword-only
.Main result info – number of unique entries, total count and average position (if exists) for each data type.
Top list of all analyzed symbols sorted by quantity. The next to it is also located the same one list, but with ignore-case. There is no need to create separate sheet, just use column of your choice.
Top list of symbol bigrams sorted by quantity with additional case insensitive
top-list.
Top list of analyzed words sorted by quantity. Word counting is always case insensitive, on the Analyze
stage.
Top list of analyzed word bigrams sorted by quantity.
2D sheet with all bigrams quantity. min_quantity
argument works here for sum of row/column values instead of each separated bigram.
Single call of all Result methods above. Order of the tuple arguments are the same as order of description above.
Please note – the last one (value for sheet_all_symbol_bigrams()) there is only in the min_quantities argument.
Default values as elsewhere: limits – (0,)*4; chart_limits – (20,)*4; min_quantities – (1,)*5.
Create symbols top-list as sheet_top_symbols()
, but only with symbols of your choice.
name
– keyword-only
Create symbols top-list as sheet_top_symbols()
, but only with base Latin symbols.
Create symbols top-list as sheet_top_symbols()
, but only with Russian Cyrillic symbols.
Create symbol bigrmas 2D sheet as sheet_all_symbol_bigrams()
, but only with symbols of your choice.
Order of symbols on the sheet will be the same as in the input argument.
name
– keyword-only
Create symbol bigrams 2D sheet as sheet_all_symbol_bigrams()
, but only with base Latin symbols.
Create symbol bigrams 2D sheet as sheet_all_symbol_bigrams()
, but only with Russian Cyrillic symbols.
Create cross-referenced sheet for all counted ye-yo words with their quantity and total misspells counter. Works only with analysis created with yo
argument as 1
or 2
.