Open bluebrush opened 4 years ago
nsml run -e main.py -m "A good message" -v -d hatespeech-1
여기서 -v가 없어야 동작하는 것 같습니다!
submit까지 전체 진행 과정.
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1$ git clone https://github.com/AI-RUSH-Operation/NAVER-AI-RUSH.git
'NAVER-AI-RUSH'에 복제합니다...
remote: Enumerating objects: 71, done.
remote: Counting objects: 100% (71/71), done.
remote: Compressing objects: 100% (56/56), done.
remote: Total 71 (delta 17), reused 59 (delta 11), pack-reused 0
오브젝트 묶음 푸는 중: 100% (71/71), 완료.
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1$ cd NAVER-AI-RUSH/
.git/ .github/ hate_speech/ spam/
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1$ cd NAVER-AI-RUSH/hate_speech/
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1/NAVER-AI-RUSH/hate_speech$ cat README.md
# Hate speech classification
AI Rush 혐오댓글 분류를 위한 경로 입니다.
Baseline model은 간단한 windowed RNN을 사용하였습니다.
## Repository format
`hate_speech/main.py` 학습 방법과 nsml.bind 함수에 대한 정의
`hate_speech/data.py` Data를 load하는 방법 정의
`hate_speech/model.py` Baseline model 정의
`hate_speech/field.json` Data의 vocab에 대한 정의 (only for torchtext)
## Run experiment
To run the baseline model training, stand in the `airush2020/spam` folder and run
\```
nsml run -e main.py -m "A good message" -d hatespeech-1
\```
## Metric
[F1 Score](https://en.wikipedia.org/wiki/F1_score) 를 사용 합니다.
## Data
개인정보 이슈로 tokenize 이후 numericalize 된 형태로 제공 됩니다.
- tokeninzer
- 음절 기반 tokenizer
- 고의적 오탈자와, 신조어가 많은 한국어 댓글 데이터에서는
형태소 기반 tokenizer, [BPE](https://en.wikipedia.org/wiki/Byte_pair_encoding), [wordpiece tokenizer](https://arxiv.org/pdf/1609.08144.pdf) 가 정상동작 하지 못합니다.
- vocab
- vocab를 이용 역산하여 원문을 밝힐 수 있기에 공개하지 못하였습니다.
- special tokens
UNK: 0, PAD:1, SPACE:2, BEGIN:3, EOF: 4
- e.g. {"syllable_contents": [3, 32, 218, 12, 25, 2, 205, 337, 16, 2, 113, 9, 2, 558, 195, 16, 2, 113, 17, 68, 2, 288, 51, 39, 12, 25, 4], "eval_reply": 0}
- 가혹한 제약조건 속에서도 창의적인 도전을 기원합니다.
### Format
See AI Rush dataset documentation.
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1/NAVER-AI-RUSH/hate_speech$ nsml run -e main.py -m "A good message" -d hatespeech-1
INFO[2020/07/15 18:03:59.459] .nsmlignore check - start
INFO[2020/07/15 18:03:59.459] .nsmlignore check - done
INFO[2020/07/15 18:03:59.492] file integrity check - start
INFO[2020/07/15 18:03:59.493] file integrity check - done
INFO[2020/07/15 18:03:59.493] .nsmlignore 20 B - start
INFO[2020/07/15 18:03:59.503] .nsmlignore 20 B - done (1/7 14.29%) (20 B/12 KiB 0.16%)
INFO[2020/07/15 18:03:59.503] README.md 1.5 KiB - start
INFO[2020/07/15 18:03:59.503] README.md 1.5 KiB - done (2/7 28.57%) (1.6 KiB/12 KiB 12.56%)
INFO[2020/07/15 18:03:59.503] data.py 2.6 KiB - start
INFO[2020/07/15 18:03:59.503] data.py 2.6 KiB - done (3/7 42.86%) (4.2 KiB/12 KiB 33.59%)
INFO[2020/07/15 18:03:59.503] fields.json 526 B - start
INFO[2020/07/15 18:03:59.503] fields.json 526 B - done (4/7 57.14%) (4.7 KiB/12 KiB 37.74%)
INFO[2020/07/15 18:03:59.503] main.py 5.7 KiB - start
INFO[2020/07/15 18:03:59.503] main.py 5.7 KiB - done (5/7 71.43%) (10 KiB/12 KiB 83.98%)
INFO[2020/07/15 18:03:59.503] model.py 1.6 KiB - start
INFO[2020/07/15 18:03:59.503] model.py 1.6 KiB - done (6/7 85.71%) (12 KiB/12 KiB 96.75%)
INFO[2020/07/15 18:03:59.503] setup.py 412 B - start
INFO[2020/07/15 18:03:59.503] setup.py 412 B - done (7/7 100.00%) (12 KiB/12 KiB 100.00%)
......
Building docker image. It may take a while
.......
Session bluebrush/hatespeech-1/10 is started
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1/NAVER-AI-RUSH/hate_speech$ nsml ps
Name Created Args Status Summary Description # of Models Size Type
------------------------- ------------- ------ -------- --------- -------------- ------------- --------- ------
bluebrush/hatespeech-1/10 seconds ago Running A good message 0 0 normal
bluebrush/hatespeech-1/6 9 minutes ago Running A good message 15 201.82 MB normal
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1/NAVER-AI-RUSH/hate_speech$ nsml model ls bluebrush/hatespeech-1/10
Checkpoint Last Modified Elapsed Summary Size
------------ --------------- --------- ----------------- --------
0 4 minutes ago 0.000 number_of_files=1 13.45 MB
1 4 minutes ago 38.104 number_of_files=1 13.45 MB
2 3 minutes ago 38.282 number_of_files=1 13.45 MB
3 3 minutes ago 38.408 number_of_files=1 13.45 MB
4 2 minutes ago 36.556 number_of_files=1 13.45 MB
5 2 minutes ago 36.732 number_of_files=1 13.45 MB
6 a minute ago 36.894 number_of_files=1 13.45 MB
7 seconds ago 36.654 number_of_files=1 13.45 MB
ubuntu16@ubuntu16-VirtualBox:~/airushdemo/src/NAVER-AI-RUSH/demo-hatespeech-1/NAVER-AI-RUSH/hate_speech$ nsml submit bluebrush/hatespeech-1/10 6
.......
Building docker image. It may take a while
...........load nsml model takes 2.0793983936309814 seconds
.Infer test set. The inference should be completed within 3600 seconds.
.Infer test set takes 11.140313148498535 seconds
...
Score: 0.9137414965986393
Done
17:40분~50분에 추가되었습니다.
baseline : https://github.com/AI-RUSH-Operation/NAVER-AI-RUSH/tree/master/hate_speech