patverga / bran

Full abstract relation extraction from biological texts with bi-affine relation attention networks
Apache License 2.0
128 stars 33 forks source link

Fail to generate the CTD dataset #11

Open freesunshine0316 opened 5 years ago

freesunshine0316 commented 5 years ago

[lsong10@bhg0031 bran]$ ./extract.sh Downloading Pubtator dump --2019-03-31 21:09:22-- ftp://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator/bioconcepts2pubtator_offsets.gz => ‘/home/lsong10/ws/exp.dep_forest/bran/data/ctd/bioconcepts2pubtator_offsets.gz’ Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 130.14.250.13, 2607:f220:41e:250::7 Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused. Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|2607:f220:41e:250::7|:21... failed: Network is unreachable. Converting data from pubtator to tsv format usage: process_CDR_data.py [-h] -i INPUT_FILE -d OUTPUT_DIR -f OUTPUT_FILE_SUFFIX [-s MAX_SEQ] [-a FULL_ABSTRACT] [-p PUBMED_FILTER] [-r RELATIONS] [-w WORD_PIECE_CODES] [-t SHARDS] [-x EXPORT_ALL_EPS] [-n EXPORT_NEGATIVES] [-e ENCODING] [-m MAX_DISTANCE] process_CDR_data.py: error: argument -a/--full_abstract: expected one argument split: extra operand ‘up’ Try 'split --help' for more information. map relations to smaller set awk: cmd. line:1: fatal: cannot open file positive_0_genia' for reading (No such file or directory) seperate data into train dev test positive train 50 500 positive dev 50 500 positive test 50 500 negative train 50 500 awk: cmd. line:1: fatal: cannot open filenegative_0_genia' for reading (No such file or directory) negative dev 50 500 awk: cmd. line:1: fatal: cannot open file negative_0_genia' for reading (No such file or directory) negative test 50 500 awk: cmd. line:1: fatal: cannot open filenegative_0_genia' for reading (No such file or directory)

patverga commented 5 years ago

Sorry for the delayed response. It looks like a network issue caused the download of the initial file to fail: "Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|130.14.250.13|:21... failed: Connection refused.". This is causing all of the subsequent errors to print because each of the following steps require this initial file.