Open slark-prime opened 1 year ago
[ ] Upload the 70G dataset to Remote Desktop
[ ] Text Block Extraction implementation From a large corpus (70GB), randomly and uniformly extract many text blocks (say, 30).
[ ]
[ ] Upload the 70G dataset to Remote Desktop
[ ] Text Block Extraction implementation From a large corpus (70GB), randomly and uniformly extract many text blocks (say, 30).
[ ]