issues
search
mesolitica
/
malaysian-dataset
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
https://malaysian-dataset.readthedocs.io/
Apache License 2.0
301
stars
106
forks
source link
lowyat (6 GB)
#27
Closed
huseinzol05
closed
1 year ago
huseinzol05
commented
1 year ago
dedup using
list(set(texts))
Directory,
https://github.com/huseinzol05/malay-dataset/blob/master/crawl/lowyat
dataset,
https://huggingface.co/datasets/mesolitica/crawl-lowyat
huseinzol05
commented
1 year ago
Crawling
huseinzol05
commented
1 year ago
Done!
list(set(texts))