issues
search
huggingface
/
OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
https://huggingface.co/datasets/HuggingFaceM4/OBELICS
Apache License 2.0
171
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How to use LDA for topic modeling
#12
jrryzh
opened
1 month ago
1
LDA
#11
jrryzh
closed
2 weeks ago
0
nsfw filtered texts only file missing at step 08_01
#10
shaharukhkhan4350
closed
1 month ago
2
Is the tot_counter saved twice in this code snippe?
#9
haiqiang2017
opened
1 month ago
4
Releasing trained topic models?
#8
vishaal27
opened
2 months ago
1
Missing TextMediaPairsExtractor from the repo
#7
kckishan
opened
2 months ago
1
common_words.json download issue
#6
jrryzh
closed
3 months ago
11
Search engine over the training data
#5
aleSuglia
opened
6 months ago
1
Metadata process
#4
ellenxtan
closed
9 months ago
4
When will the trained model be released?
#3
chenxshuo
opened
10 months ago
3
Which folder to use?
#2
mckinziebrandon
closed
9 months ago
2
Training Details
#1
vateye
closed
8 months ago
1