undertheseanlp / underthesea

Underthesea - Vietnamese NLP Toolkit
http://undertheseanlp.com
GNU General Public License v3.0
1.37k stars 270 forks source link

Can not download DI_Vietnamese-UVD dataset #641

Closed KhuongDuy-Nguyen closed 1 year ago

KhuongDuy-Nguyen commented 1 year ago

When I download the DI Vietnamese-UVD dataset by underthesea download-data DI_Vietnamese-UVD, it gives me this error. How can I fix that?

image

rain1024 commented 1 year ago

@KhuongDuy-Nguyen Thanks for reporting the issue.

The configuration of the dataset's filename was incorrect, I've fixed it and release new version 6.1.0. Please update underthesea and let me know if the code is now working.

KhuongDuy-Nguyen commented 1 year ago

@rain1024 I ran cmd using admin and restarted my laptop but it still error

image

Btw, another dataset still have the same error when i check all of them

image

image

image

image

rain1024 commented 1 year ago

@KhuongDuy-Nguyen Thanks for your comment

The process cannot access the file because it is being used by another process...

The bug related to opening a Zipfile has been fixed.

I also fixed errors related with UTS2017_BANK, VNESES, CP_Vientamese-UNC, VNTC datasets.

Please update underthesea version 6.1.1 and let me know if the code is now working.

KhuongDuy-Nguyen commented 1 year ago

@rain1024 I can download them but I can't open DI_Vietnamese-UVD

image

rain1024 commented 1 year ago

@KhuongDuy-Nguyen The "UVD.bin" file is a dump created by pickle. To access its contents, you can use the following code

import pickle

filepath = "UVD.bin"

with open(filepath, "rb") as f:
        data = pickle.load(f)

for item in data:
        print(item)
        print(data[item])
KhuongDuy-Nguyen commented 1 year ago

@rain1024 Got it, thanks for your support