[공지] dataset update 알림

bluebrush commented 4 years ago

2주차 데이터 공개합니다.

spam-2 : 2주차 데이터, nsml run -d spam-2 로 학습 필요

hate_2 : 2주차 데이터, nsml run -d hate_2로 재학습 필요. hate_raw : 별도 추가 공개된 data로 nsml run -d hate_2 -d hate_raw 로(-d 순서 중요, 2번 사용) 추가 데이터 사용 dataset 2개 사용시 아래 이슈 참고바랍니다. https://github.com/AI-RUSH-Operation/NAVER-AI-RUSH/issues/69#issuecomment-660890812 hate_rwa 내부의 파일 위치는 /data/hate_raw/train/raw.json 입니다.

ddamddi commented 4 years ago

spam classification은 spam-1대신 spam-2로 학습후 제출하면 되는건가요?

bluebrush commented 4 years ago

@ddamddi 님, 댓글 내용 수정합니다. 변경전: 학습데이터 증가 변경후: 학습데이터 동일, 평가셋만 증가

redleaf-kim commented 4 years ago

@bluebrush 테스트데이터만 증가한 것이 아니라 학습데이터의 갯수도 증가하였나요??

bluebrush commented 4 years ago

@MaiHon 님, 내용 수정합니다. 평가셋만 증가했습니다.

thejungwon commented 4 years ago

hate_raw 데이터는 레이블링이 되지 않은 데이터셋을 말씀하시는게 맞나요?

bluebrush commented 4 years ago

@thejungwon 네에, 레이블링 되지 않은 데이터셋입니다.

dhsimpson commented 4 years ago

-d hate_2 -d hate_raw 로 하니 FileNotFoundError: [Errno 2] No such file or directory: "['/data/hate_2', '/data/hate_raw']/train/train_data" 이 에러가 뜨네요. 아직 hate_raw 데이터는 추가되지 않은건가요?

bluebrush commented 4 years ago

@dhsimpson 님, -d dataset1 -d dataset2 이렇게 2개 이상을 사용할때, DATASET_PATH 가 List로 변경되어 발생하는 문제입니다. baseline code가 1개 dataset을 가정하고 있어서, list에 대한 처리가 없는 상태입니다. 아래와 같이 list에 index로 2개 데이터셋을 만들어서 사용하시면 될 것 같습니다.

1개 dataset 일때,
hate_1_path = DATASET_PATH
2개 dataset 일때,
hate_1_path = DATASET_PATH[0]
hate_2_path = DATASET_PATH[1]

fenneccat commented 4 years ago

@bluebrush 님, hate_raw를 사용하려고 했는데 다음과 같은 에러가 뜹니다. Traceback (most recent call last): File "main_unlabeld_data.py", line 387, in <module> trainer = Trainer(device='cuda') File "main_unlabeld_data.py", line 250, in __init__ self.task = HateSpeech(self.TRAIN_DATA_PATH, (9, 1)) File "/app/data_unlabeled.py", line 36, in __init__ self.examples = self.load_corpus(corpus_path) File "/app/data_unlabeled.py", line 48, in load_corpus with open(path) as fp: FileNotFoundError: [Errno 2] No such file or directory: '/data/hate_raw/train/train_data'

아직 데이터 추가가 안된건가요?

kingheadcat commented 4 years ago

hate_raw data는 dash board 용 data가 아니며, label이 없는 데이터 이고, 때문에 fine tuning데이터와 다르게 label이 없습니다. 또 task def를 강제하지 않아 sentence classification으로 정의해 둔 HataSpeech class를 그대로 사용하실 수 없습니다. /data/hate_raw/train/raw.json 에 아래와 같은 형태로 있어, 적절한 방법으로 사용하시면 됩니다.

{"syllable_contents": [3, 134, 61, 432, 2, 1774, 436, 83, 5, 21, 24, 2, 40, 55, 395, 2, 657, 47, 2, 201, 8, 10, 36, 2, 27, 153, 30, 13, 25, 29, 29, 29, 29, 29, 134, 61, 432, 2, 1774, 436, 83, 5, 21, 24, 2, 40, 55, 395, 2, 657, 47, 2, 201, 8, 10, 36, 2, 27, 153, 30, 13, 25, 29, 29, 29, 29, 29, 134, 61, 432, 2, 1774, 436, 83, 5, 21, 24, 2, 40, 55, 395, 2, 657, 47, 2, 201, 8, 10, 36, 2, 27, 153, 30, 13, 25, 29, 29, 29, 29, 29, 134, 61, 432, 2, 1774, 436, 83, 5, 21, 24, 2, 40, 55, 395, 2, 657, 47, 2, 201, 8, 10, 36, 2, 27, 153, 30, 13, 25, 29, 29, 29, 29, 29, 4]}
{"syllable_contents": [3, 91, 124, 2, 97, 31, 2, 11, 9, 2, 126, 395, 21, 2, 256, 250, 11, 12, 2, 60, 248, 22, 54, 245, 2, 46, 178, 153, 85, 2, 64, 35, 21, 2, 442, 66, 21, 24, 2, 476, 72, 130, 40, 328, 2, 12, 67, 30, 2, 228, 2, 492, 2, 228, 2, 183, 600, 2, 721, 47, 2, 183, 600, 11, 9, 2, 387, 21, 24, 10, 2, 35, 527, 269, 61, 135, 30, 524, 2, 11, 328, 2, 461, 283, 18, 35, 2, 5, 122, 35, 2, 5, 250, 2, 25, 65, 11, 10, 6, 46, 542, 138, 28, 10, 39, 12, 6, 4]}
...

Namsik-Yoon commented 4 years ago

@bluebrush 님, hate_raw를 사용하려고 했는데 다음과 같은 에러가 뜹니다. Traceback (most recent call last): File "main_unlabeld_data.py", line 387, in <module> trainer = Trainer(device='cuda') File "main_unlabeld_data.py", line 250, in __init__ self.task = HateSpeech(self.TRAIN_DATA_PATH, (9, 1)) File "/app/data_unlabeled.py", line 36, in __init__ self.examples = self.load_corpus(corpus_path) File "/app/data_unlabeled.py", line 48, in load_corpus with open(path) as fp: FileNotFoundError: [Errno 2] No such file or directory: '/data/hate_raw/train/train_data'

아직 데이터 추가가 안된건가요?

저도 해당 에러와 똑같은 에러가 발생합니다. 혹시 해결되었나요?

fenneccat commented 4 years ago

@bluebrush 님, hate_raw를 사용하려고 했는데 다음과 같은 에러가 뜹니다. Traceback (most recent call last): File "main_unlabeld_data.py", line 387, in <module> trainer = Trainer(device='cuda') File "main_unlabeld_data.py", line 250, in __init__ self.task = HateSpeech(self.TRAIN_DATA_PATH, (9, 1)) File "/app/data_unlabeled.py", line 36, in __init__ self.examples = self.load_corpus(corpus_path) File "/app/data_unlabeled.py", line 48, in load_corpus with open(path) as fp: FileNotFoundError: [Errno 2] No such file or directory: '/data/hate_raw/train/train_data' 아직 데이터 추가가 안된건가요?

저도 해당 에러와 똑같은 에러가 발생합니다. 혹시 해결되었나요?

https://github.com/AI-RUSH-Operation/NAVER-AI-RUSH/issues/69#issuecomment-660918314 위에 @kingheadcat 님 댓글 보시면 위치가 /data/hate_raw/train/raw.json라고 하시네요. 방금 테스팅 해봤는데 됩니다^^

naver-airush / NAVER-AI-RUSH

[공지] dataset update 알림 #69