issues
search
tsproisl
/
SoMaJo
A tokenizer and sentence splitter for German and English web and social media texts.
GNU General Public License v3.0
135
stars
21
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Problem with a technical term and splitting by hyphen
#32
g3rfx
closed
1 month ago
2
SRX for sentence_splitter
#31
fewzee
opened
3 months ago
1
support custom abbreviation
#30
krambox
opened
4 months ago
1
Other MD issue.
#29
PhilipMay
closed
4 months ago
2
Other issue with Markdown style links.
#28
PhilipMay
closed
4 months ago
1
Issue with Markdown style links.
#27
PhilipMay
closed
7 months ago
3
Markdown link splitting bug.
#26
PhilipMay
closed
12 months ago
6
Dates at the end of sentences
#25
ausgerechnet
closed
1 year ago
1
Document all possible values for `token_class`.
#24
PhilipMay
closed
1 year ago
6
Added roman ordinals, abbreviation "Art." preceding numbers
#23
AndreasBlombach
closed
1 year ago
0
publish on conda-forge
#22
iulusoy
closed
1 year ago
2
Domain adaptation to law texts
#21
adbar
closed
3 years ago
4
Tokenizer text recovery problem
#20
shabie
opened
3 years ago
1
tokenize with continue multiple punctions
#19
liutianling
opened
3 years ago
1
False Positives with URLS
#18
max-otto
opened
4 years ago
2
How do I split sentences but not words?
#17
PhilipMay
closed
3 years ago
2
Apostrophe die Vokale ersetzen
#16
anonymous-poetrybot-386
closed
3 years ago
5
Segmentation of sentences in lowercase
#15
konstantinmiller
closed
4 years ago
2
Thread safety
#14
konstantinmiller
closed
4 years ago
6
Tokenizer outputs single characters per line
#13
konstantinmiller
closed
4 years ago
4
Phonenumber and sad emoje
#12
max-otto
closed
4 years ago
1
Quotation Marks
#11
NebelAI
closed
4 years ago
1
Failing unit test in 2.0.2
#10
danieldk
closed
4 years ago
4
German text to sentence segmentation
#9
Tortoise17
closed
4 years ago
1
crashes machine when used with multiprocessing
#8
MiniXC
closed
5 years ago
3
What's wrong?
#7
skks11
closed
4 years ago
1
How to get specific classification about word (eg. verb,noun,abbr.,preposition)
#6
jackson1895
closed
5 years ago
1
Getting Error where running the doc code
#5
aswin-giridhar
closed
5 years ago
3
add citation
#4
RichStone
closed
5 years ago
0
How to add own exceptions to the tokenizer?
#3
RichStone
closed
5 years ago
2
typo corrected in Readme
#2
adbar
closed
6 years ago
0
Sentence Splitter does not work
#1
rroderich
closed
6 years ago
1