jbjorne / TEES

Turku Event Extraction System
147 stars 44 forks source link

Drugbank AssertionError #23

Open afergadis opened 7 years ago

afergadis commented 7 years ago

Hi.

I'm trying to use the DDI11 model on the example dataset in Unified format. The classify.py script started to download the http://www.drugbank.ca/system/downloads/current/drugbank.xml.zip file. After that it failed to extract it because it has move from that url.

I download it using curl -L -o filename.zip -u USERNAME:PASSWORD https://www.drugbank.ca/releases/5-0-6/downloads/all-full-database into the expected folder and renamed it to drugbank.xml.zip. The script found the file, extracted it and gives the error: AssertionError: {http://www.drugbank.ca}drugbank. when Processing DrugBank XML

The full log follows.

./classify.py -i ~/pub/pmc/aris_data/output/aspirin-preprocessed.xml.gz -o ~/pub/pmc/aris_data/output/aspirin -m DDI11                                                                                             
Psyco not installed          
Opening log /home/aris/pub/pmc/aris_data/output/aspirin-log.txt at Fri May 19 00:27:03 2017                         
Classifying input /home/aris/pub/pmc/aris_data/output/aspirin-preprocessed.xml.gz                                   
Model /home/aris/TEES/DDI11 doesn't exist, looking for a default model                                              
Classifying with default model /home/aris/.tees/models/DDI11-test                                                   
Preprocessor output /home/aris/pub/pmc/aris_data/output/aspirin-preprocessed.xml.gz exists, skipping preprocessing. 
=== EXIT STEP PREPROCESS time: 0:00:00.000173 ===         
Caching model "/home/aris/.tees/models/DDI11-test" member "TEES_MODEL_VALUES.tsv" to "/tmp/tmpJVj7SK/TEES_MODEL_VALUES.tsv"
Importing detector Detectors.EdgeDetector                 
Caching model "/home/aris/.tees/models/DDI11-test" member "TEES_MODEL_VALUES.tsv" to "/tmp/tmpXzMm22/TEES_MODEL_VALUES.tsv"
* EdgeDetector:CLASSIFY(ENTER) *                          
Caching model "/home/aris/.tees/models/DDI11-test" member "TEES_MODEL_VALUES.tsv" to "/tmp/tmpVbsOm8/TEES_MODEL_VALUES.tsv"
Caching model "/home/aris/.tees/models/DDI11-test" member "edge-classifier-model" to "/tmp/tmpVbsOm8/edge-classifier-model"
Caching model "/home/aris/.tees/models/DDI11-test" member "structure.txt" to "/tmp/tmpVbsOm8/structure.txt"         
Example generation for /tmp/tmpBAW6jv/aspirin-edge-examples.gz                                                      
Caching model "/home/aris/.tees/models/DDI11-test" member "edge-ids.classes" to "/tmp/tmpVbsOm8/edge-ids.classes"   
Caching model "/home/aris/.tees/models/DDI11-test" member "edge-ids.features" to "/tmp/tmpVbsOm8/edge-ids.features" 
Running EdgeExampleBuilder   
  input: /home/aris/pub/pmc/aris_data/output/aspirin-preprocessed.xml.gz                                            
  output: /tmp/tmpBAW6jv/aspirin-edge-examples.gz (append: False)                                                   
  add new class/feature ids: False                        
  style: drugbank_features:ddi_mtmx:filter_shortest_path=conj_and                                                   
  parse: McCC                
Using predefined class names from /tmp/tmpVbsOm8/edge-ids.classes                                                   
Using predefined feature names from /tmp/tmpVbsOm8/edge-ids.features                                                
Drug Bank XML not installed, installing now               
--------------- Downloading Drug Bank XML --------------- 
See http://www.drugbank.ca/downloads for conditions of use                                                          
Skipping already downloaded file http://www.drugbank.ca/system/downloads/current/drugbank.xml.zip                   
Extracting /home/aris/.tees/resources/download/drugbank.xml.zip to /home/aris/.tees/resources                       
 [|###########################################################################################|] 100% Time: 00:00:04
Adding local setting DRUG_BANK_XML=/home/aris/.tees/resources/full database.xml (added variable)                    
Loading DrugBank XML from /home/aris/.tees/resources/full database.xml                                              
Processing DrugBank XML      
Traceback (most recent call last):                        
  File "./classify.py", line 190, in <module>             
    preprocessorParams=options.preprocessorParams, bioNLPSTParams=options.bioNLPSTParams)                           
  File "./classify.py", line 78, in classify              
    detector.classify(classifyInput, model, output, goldData=goldInput, fromStep=detectorSteps["CLASSIFY"], omitSteps=omitDetectorSteps["CLASSIFY"], workDir=workDir)
  File "/home/aris/TEES/Detectors/SingleStageDetector.py", line 131, in classify                                    
    model.get(self.tag+"classifier-model", defaultIfNotExist=None), goldData, parse, float(model.getStr("recallAdjustParameter", defaultIfNotExist=1.0)))
  File "/home/aris/TEES/Detectors/SingleStageDetector.py", line 162, in classifyToXML                               
    self.buildExamples(model, [data], [exampleFileName], [goldData], parse=parse, exampleStyle=exampleStyle)        
  File "/home/aris/TEES/Detectors/Detector.py", line 203, in buildExamples                                          
    structureAnalyzer=self.structureAnalyzer)             
  File "/home/aris/TEES/ExampleBuilders/ExampleBuilder.py", line 214, in run                                        
    builder = cls(style=style, classSet=classSet, featureSet=featureSet)                                            
  File "/home/aris/TEES/ExampleBuilders/EdgeExampleBuilder.py", line 98, in __init__                                
    self.drugFeatureBuilder = DrugFeatureBuilder(featureSet)                                                        
  File "/home/aris/TEES/ExampleBuilders/FeatureBuilders/DrugFeatureBuilder.py", line 34, in __init__                
    DrugFeatureBuilder.data, DrugFeatureBuilder.nameToId = prepareDrugBank(drugBankFile)                            
  File "/home/aris/TEES/ExampleBuilders/FeatureBuilders/DrugFeatureBuilder.py", line 243, in prepareDrugBank        
    data = loadDrugBank(drugBankFile)                     
  File "/home/aris/TEES/ExampleBuilders/FeatureBuilders/DrugFeatureBuilder.py", line 223, in loadDrugBank           
    assert root.tag == preTag+"drugs", root.tag           
AssertionError: {http://www.drugbank.ca}drugbank

Is there something I do wrong, or maybe is the drugbank schema changed? Thank you in advance.

jbjorne commented 7 years ago

The DrugBank XML schema has apparently been changed, so the TEES model is no longer compatible with the current version. In order to make it possible to use the existing TEES models and to ensure replicability of the TEES DDIExtraction Shared Task results, I've added a copy of an older DrugBank version to the TEES files. The DrugBank dataset is released under the CC BY-NC 4.0 license so this should be OK.

If you update your TEES checkout to the latest development branch version, using DrugBank with the DDI11 and DDI13 models should work again. If you have already installed a version of DrugBank that does not work, please first remove the line defining the 'DRUG_BANK_XML' variable from your TEES local settings file (by default this is located at ~/.tees_local_settings.py) and also remove the 'drugbank.xml' file from the location assigned to the 'DRUG_BANK_XML' variable.