os-climate / aicoe-osc-demo

This repository is the central location for the demos the ET data science team is developing within the OS-Climate project. This demo shows how to use the tools provided by Open Data Hub (ODH) running on the Operate First cluster to perform ETL, create training and inference pipelines.
Apache License 2.0
11 stars 24 forks source link

annotation input deadlock in train_kpi_extraction #174

Open MichaelTiemannOSC opened 2 years ago

MichaelTiemannOSC commented 2 years ago

Describe the bug I've been adding annotations to s3://redhat-osc-physical-landing-647521352890/test_cdp2/pipeline_run/cdp/annotations/20220709 CDP aggregated_annotations_needs_correction.xlsx and have now managed to lock up the train_kpi_extraction notebook. Here's the output cell where progress stops:

07/11/2022 11:34:38 - INFO - src.models.qa_farm_trainer -   Loading the /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train.json data and splitting to train and val...
07/11/2022 11:34:38 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: True
07/11/2022 11:34:38 - INFO - farm.modeling.tokenization -   Loading tokenizer of type 'RobertaTokenizer'
07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -   
Loading data into the data silo ... 
              ______
               |o  |   !
   __          |:`_|---'-.
  |__|______.-/ _ \-----.|       
 (o)(o)------'\ _ /     ( )      

07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -   Loading train set from: /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json 
07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -   Got ya 7 parallel workers to convert 644 dictionaries to pytorch datasets (chunksize = 19)...
07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -    0    0    0    0    0    0    0 
07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -   /w\  /|\  /w\  /|\  /w\  /w\  /w\
07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -   / \  /'\  / \  /'\  /'\  /'\  / \
07/11/2022 11:34:39 - INFO - farm.data_handler.data_silo -               
Preprocessing Dataset /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json:   0%|          | 0/644 [00:00<?, ? Dicts/s]07/11/2022 11:34:40 - INFO - farm.data_handler.processor -   *** Show 2 random examples ***
07/11/2022 11:34:40 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 17-1-0
Clear Text: 
    passage_text: attach the document bayer-sustainability-report-2020.pdf
    question_text: What is the Start Date of the CDP report published?
    passage_id: 0
    answers: []
Tokenized: 
    passage_start_t: 0
    passage_tokens: ['attach', 'Ġthe', 'Ġdocument', 'Ġb', 'ayer', '-', 's', 'ustain', 'ability', '-', 'report', '-', '2020', '.', 'pdf']
    passage_offsets: [0, 7, 11, 20, 21, 25, 26, 27, 33, 40, 41, 47, 48, 52, 53]
    passage_start_of_word: [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    question_tokens: ['What', 'Ġis', 'Ġthe', 'ĠStart', 'ĠDate', 'Ġof', 'Ġthe', 'ĠC', 'DP', 'Ġreport', 'Ġpublished', '?']
    question_offsets: [0, 5, 8, 12, 18, 23, 26, 30, 31, 34, 41, 50]
    question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0]
    answers: []
    document_offsets: [0, 7, 11, 20, 21, 25, 26, 27, 33, 40, 41, 47, 48, 52, 53]
Features: 
    input_ids: [0, 2264, 16, 5, 2776, 10566, 9, 5, 230, 5174, 266, 1027, 116, 2, 2, 47277, 5, 3780, 741, 19777, 12, 29, 26661, 4484, 12, 7415, 12, 24837, 4, 31494, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    answer_type_ids: [0]
    passage_start_t: 0
    start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    labels: [[ 0  0]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
    id: [17, 1, 0]
    seq_2_start_t: 15
    span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
07/11/2022 11:34:41 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 9-0-0
Clear Text: 
    passage_text: explanation of financial impact figure i description: the potential impact of this risk is increased prices for our purchased energy due to a continuous tightening of the eu ets. ii calculation: between 2020 and 2023, bayer expects total costs of eur 55-75 million due to the possible continuous tightening of the eu ets. this calculation is based on internal emission regulations of the respective sites and the assumption that an increase in the price of emission allowances will initially rise to eur 70 per ton during this period. we assume that the political decision makers are aiming for a certificate price of around eur 100 for the needs-based management of energy production. overall, the indirect impact of the eu ets should remain relatively low as bayer has invested heavily in energy efficiency measures in the past.
    question_text: What is the target year?
    passage_id: 0
    answers: []
Tokenized: 
    passage_start_t: 0
    passage_tokens: ['ex', 'plan', 'ation', 'Ġof', 'Ġfinancial', 'Ġimpact', 'Ġfigure', 'Ġi', 'Ġdescription', ':', 'Ġthe', 'Ġpotential', 'Ġimpact', 'Ġof', 'Ġthis', 'Ġrisk', 'Ġis', 'Ġincreased', 'Ġprices', 'Ġfor', 'Ġour', 'Ġpurchased', 'Ġenergy', 'Ġdue', 'Ġto', 'Ġa', 'Ġcontinuous', 'Ġtightening', 'Ġof', 'Ġthe', 'Ġe', 'u', 'Ġe', 'ts', '.', 'Ġii', 'Ġcalculation', ':', 'Ġbetween', 'Ġ2020', 'Ġand', 'Ġ20', '23', ',', 'Ġb', 'ayer', 'Ġexpects', 'Ġtotal', 'Ġcosts', 'Ġof', 'Ġe', 'ur', 'Ġ55', '-', '75', 'Ġmillion', 'Ġdue', 'Ġto', 'Ġthe', 'Ġpossible', 'Ġcontinuous', 'Ġtightening', 'Ġof', 'Ġthe', 'Ġe', 'u', 'Ġe', 'ts', '.', 'Ġthis', 'Ġcalculation', 'Ġis', 'Ġbased', 'Ġon', 'Ġinternal', 'Ġemission', 'Ġregulations', 'Ġof', 'Ġthe', 'Ġrespective', 'Ġsites', 'Ġand', 'Ġthe', 'Ġassumption', 'Ġthat', 'Ġan', 'Ġincrease', 'Ġin', 'Ġthe', 'Ġprice', 'Ġof', 'Ġemission', 'Ġallowances', 'Ġwill', 'Ġinitially', 'Ġrise', 'Ġto', 'Ġe', 'ur', 'Ġ70', 'Ġper', 'Ġton', 'Ġduring', 'Ġthis', 'Ġperiod', '.', 'Ġwe', 'Ġassume', 'Ġthat', 'Ġthe', 'Ġpolitical', 'Ġdecision', 'Ġmakers', 'Ġare', 'Ġaiming', 'Ġfor', 'Ġa', 'Ġcertificate', 'Ġprice', 'Ġof', 'Ġaround', 'Ġe', 'ur', 'Ġ100', 'Ġfor', 'Ġthe', 'Ġneeds', '-', 'based', 'Ġmanagement', 'Ġof', 'Ġenergy', 'Ġproduction', '.', 'Ġoverall', ',', 'Ġthe', 'Ġindirect', 'Ġimpact', 'Ġof', 'Ġthe', 'Ġe', 'u', 'Ġe', 'ts', 'Ġshould', 'Ġremain', 'Ġrelatively', 'Ġlow', 'Ġas', 'Ġb', 'ayer', 'Ġhas', 'Ġinvested', 'Ġheavily', 'Ġin', 'Ġenergy', 'Ġefficiency', 'Ġmeasures', 'Ġin', 'Ġthe', 'Ġpast', '.']
    passage_offsets: [0, 2, 6, 12, 15, 25, 32, 39, 41, 52, 54, 58, 68, 75, 78, 83, 88, 91, 101, 108, 112, 116, 126, 133, 137, 140, 142, 153, 164, 167, 171, 172, 174, 175, 177, 179, 182, 193, 195, 203, 208, 212, 214, 216, 218, 219, 224, 232, 238, 244, 247, 248, 251, 253, 254, 257, 265, 269, 272, 276, 285, 296, 307, 310, 314, 315, 317, 318, 320, 322, 327, 339, 342, 348, 351, 360, 369, 381, 384, 388, 399, 405, 409, 413, 424, 429, 432, 441, 444, 448, 454, 457, 466, 477, 482, 492, 497, 500, 501, 504, 507, 511, 515, 522, 527, 533, 535, 538, 545, 550, 554, 564, 573, 580, 584, 591, 595, 597, 609, 615, 618, 625, 626, 629, 633, 637, 641, 646, 647, 653, 664, 667, 674, 684, 686, 693, 695, 699, 708, 715, 718, 722, 723, 725, 726, 729, 736, 743, 754, 758, 761, 762, 767, 771, 780, 788, 791, 798, 809, 818, 821, 825, 829]
    passage_start_of_word: [1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
    question_tokens: ['What', 'Ġis', 'Ġthe', 'Ġtarget', 'Ġyear', '?']
    question_offsets: [0, 5, 8, 12, 19, 23]
    question_start_of_word: [1, 1, 1, 1, 1, 0]
    answers: []
    document_offsets: [0, 2, 6, 12, 15, 25, 32, 39, 41, 52, 54, 58, 68, 75, 78, 83, 88, 91, 101, 108, 112, 116, 126, 133, 137, 140, 142, 153, 164, 167, 171, 172, 174, 175, 177, 179, 182, 193, 195, 203, 208, 212, 214, 216, 218, 219, 224, 232, 238, 244, 247, 248, 251, 253, 254, 257, 265, 269, 272, 276, 285, 296, 307, 310, 314, 315, 317, 318, 320, 322, 327, 339, 342, 348, 351, 360, 369, 381, 384, 388, 399, 405, 409, 413, 424, 429, 432, 441, 444, 448, 454, 457, 466, 477, 482, 492, 497, 500, 501, 504, 507, 511, 515, 522, 527, 533, 535, 538, 545, 550, 554, 564, 573, 580, 584, 591, 595, 597, 609, 615, 618, 625, 626, 629, 633, 637, 641, 646, 647, 653, 664, 667, 674, 684, 686, 693, 695, 699, 708, 715, 718, 722, 723, 725, 726, 729, 736, 743, 754, 758, 761, 762, 767, 771, 780, 788, 791, 798, 809, 818, 821, 825, 829]
Features: 
    input_ids: [0, 2264, 16, 5, 1002, 76, 116, 2, 2, 3463, 11181, 1258, 9, 613, 913, 1955, 939, 8194, 35, 5, 801, 913, 9, 42, 810, 16, 1130, 850, 13, 84, 3584, 1007, 528, 7, 10, 11152, 12872, 9, 5, 364, 257, 364, 1872, 4, 42661, 21586, 35, 227, 2760, 8, 291, 1922, 6, 741, 19777, 3352, 746, 1042, 9, 364, 710, 3490, 12, 2545, 153, 528, 7, 5, 678, 11152, 12872, 9, 5, 364, 257, 364, 1872, 4, 42, 21586, 16, 716, 15, 3425, 22679, 3478, 9, 5, 7091, 3091, 8, 5, 15480, 14, 41, 712, 11, 5, 425, 9, 22679, 23885, 40, 3225, 1430, 7, 364, 710, 1510, 228, 4866, 148, 42, 675, 4, 52, 6876, 14, 5, 559, 568, 6644, 32, 9998, 13, 10, 10921, 425, 9, 198, 364, 710, 727, 13, 5, 782, 12, 805, 1052, 9, 1007, 931, 4, 1374, 6, 5, 18677, 913, 9, 5, 364, 257, 364, 1872, 197, 1091, 3487, 614, 25, 741, 19777, 34, 5221, 4008, 11, 1007, 5838, 1797, 11, 5, 375, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    answer_type_ids: [0]
    passage_start_t: 0
    start_of_word: [0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    labels: [[ 0  0]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
    id: [9, 0, 0]
    seq_2_start_t: 9
    span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
Preprocessing Dataset /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json:  18%|█▊        | 114/644 [00:15<01:01,  8.67 Dicts/s]

To Reproduce Steps to reproduce the behavior:

  1. My experiment is rooted at s3://redhat-osc-physical-landing-647521352890/test_cdp2/
  2. My branch (which contains important parameters in config.py) is cdp-experiments
  3. Run the notebooks in sequence (pdf_text_extraction, pdf_text_curation, train_relevance)
  4. See error if you can run train_kpi_extraction

Expected behavior I expect train_kpi_extraction to complete

Screenshots If applicable, add screenshots to help explain your problem.

Additional context I did not see this error with previous versions of my annotation file (which initially contained Coca-Cola and PGE data). I later added Bayer AG and Apple, and that's when it locked up.

MichaelTiemannOSC commented 2 years ago

Copying @JeremyGohBNP @andraNew for visibility.

MichaelTiemannOSC commented 2 years ago

Same problem, but locks up differently, each time I run it:

07/12/2022 14:22:05 - INFO - src.models.qa_farm_trainer -   Loading the /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train.json data and splitting to train and val...
07/12/2022 14:22:06 - INFO - farm.utils -   device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: True
07/12/2022 14:22:06 - INFO - farm.modeling.tokenization -   Loading tokenizer of type 'RobertaTokenizer'
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -   
Loading data into the data silo ... 
              ______
               |o  |   !
   __          |:`_|---'-.
  |__|______.-/ _ \-----.|       
 (o)(o)------'\ _ /     ( )      

07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -   Loading train set from: /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json 
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -   Got ya 7 parallel workers to convert 644 dictionaries to pytorch datasets (chunksize = 19)...
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -    0    0    0    0    0    0    0 
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -   /w\  /|\  /w\  /|\  /w\  /w\  /w\
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -   / \  /'\  / \  /'\  /'\  /'\  / \
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -               
Preprocessing Dataset /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json:   0%|          | 0/644 [00:00<?, ? Dicts/s]07/12/2022 14:22:08 - INFO - farm.data_handler.processor -   *** Show 2 random examples ***
07/12/2022 14:22:08 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 15-8-0
Clear Text: 
    passage_text: allocation level detail we allocate the emissions for requesting companies through a market value approach. the co2 emissions are scope 1 emissions for crop science.
    question_text: Break down your total gross global Scope 1 emissions by business activity.
    passage_id: 0
    answers: []
Tokenized: 
    passage_start_t: 0
    passage_tokens: ['all', 'ocation', 'Ġlevel', 'Ġdetail', 'Ġwe', 'Ġallocate', 'Ġthe', 'Ġemissions', 'Ġfor', 'Ġrequesting', 'Ġcompanies', 'Ġthrough', 'Ġa', 'Ġmarket', 'Ġvalue', 'Ġapproach', '.', 'Ġthe', 'Ġco', '2', 'Ġemissions', 'Ġare', 'Ġscope', 'Ġ1', 'Ġemissions', 'Ġfor', 'Ġcrop', 'Ġscience', '.']
    passage_offsets: [0, 3, 11, 17, 24, 27, 36, 40, 50, 54, 65, 75, 83, 85, 92, 98, 106, 108, 112, 114, 116, 126, 130, 136, 138, 148, 152, 157, 164]
    passage_start_of_word: [1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0]
    question_tokens: ['Break', 'Ġdown', 'Ġyour', 'Ġtotal', 'Ġgross', 'Ġglobal', 'ĠScope', 'Ġ1', 'Ġemissions', 'Ġby', 'Ġbusiness', 'Ġactivity', '.']
    question_offsets: [0, 6, 11, 16, 22, 28, 35, 41, 43, 53, 56, 65, 73]
    question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
    answers: []
    document_offsets: [0, 3, 11, 17, 24, 27, 36, 40, 50, 54, 65, 75, 83, 85, 92, 98, 106, 108, 112, 114, 116, 126, 130, 136, 138, 148, 152, 157, 164]
Features: 
    input_ids: [0, 39539, 159, 110, 746, 4200, 720, 30108, 112, 5035, 30, 265, 1940, 4, 2, 2, 1250, 15644, 672, 4617, 52, 25915, 5, 5035, 13, 14030, 451, 149, 10, 210, 923, 1548, 4, 5, 1029, 176, 5035, 32, 7401, 112, 5035, 13, 6792, 2866, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    answer_type_ids: [0]
    passage_start_t: 0
    start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    labels: [[ 0  0]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
    id: [15, 8, 0]
    seq_2_start_t: 16
    span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
07/12/2022 14:22:08 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 14-2-0
Clear Text: 
    passage_text: specific ghg emissions emissions intensity for the current and the previous reporting year are described within the sustainability report, which is verified with a limited assurance by deloitte. thus, they are included in the verification process. bayer-sustainability-report-2020.pdf
    question_text: What is the base year end date for scope 1 emissions?
    passage_id: 0
    answers: []
Tokenized: 
    passage_start_t: 0
    passage_tokens: ['specific', 'Ġgh', 'g', 'Ġemissions', 'Ġemissions', 'Ġintensity', 'Ġfor', 'Ġthe', 'Ġcurrent', 'Ġand', 'Ġthe', 'Ġprevious', 'Ġreporting', 'Ġyear', 'Ġare', 'Ġdescribed', 'Ġwithin', 'Ġthe', 'Ġsustainability', 'Ġreport', ',', 'Ġwhich', 'Ġis', 'Ġverified', 'Ġwith', 'Ġa', 'Ġlimited', 'Ġassurance', 'Ġby', 'Ġdel', 'o', 'itte', '.', 'Ġthus', ',', 'Ġthey', 'Ġare', 'Ġincluded', 'Ġin', 'Ġthe', 'Ġverification', 'Ġprocess', '.', 'Ġb', 'ayer', '-', 's', 'ustain', 'ability', '-', 'report', '-', '2020', '.', 'pdf']
    passage_offsets: [0, 9, 11, 13, 23, 33, 43, 47, 51, 59, 63, 67, 76, 86, 91, 95, 105, 112, 116, 131, 137, 139, 145, 148, 157, 162, 164, 172, 182, 185, 188, 189, 193, 195, 199, 201, 206, 210, 219, 222, 226, 239, 246, 248, 249, 253, 254, 255, 261, 268, 269, 275, 276, 280, 281]
    passage_start_of_word: [1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    question_tokens: ['What', 'Ġis', 'Ġthe', 'Ġbase', 'Ġyear', 'Ġend', 'Ġdate', 'Ġfor', 'Ġscope', 'Ġ1', 'Ġemissions', '?']
    question_offsets: [0, 5, 8, 12, 17, 22, 26, 31, 35, 41, 43, 52]
    question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
    answers: []
    document_offsets: [0, 9, 11, 13, 23, 33, 43, 47, 51, 59, 63, 67, 76, 86, 91, 95, 105, 112, 116, 131, 137, 139, 145, 148, 157, 162, 164, 172, 182, 185, 188, 189, 193, 195, 199, 201, 206, 210, 219, 222, 226, 239, 246, 248, 249, 253, 254, 255, 261, 268, 269, 275, 276, 280, 281]
Features: 
    input_ids: [0, 2264, 16, 5, 1542, 76, 253, 1248, 13, 7401, 112, 5035, 116, 2, 2, 14175, 34648, 571, 5035, 5035, 10603, 13, 5, 595, 8, 5, 986, 2207, 76, 32, 1602, 624, 5, 11128, 266, 6, 61, 16, 13031, 19, 10, 1804, 15492, 30, 2424, 139, 13537, 4, 4634, 6, 51, 32, 1165, 11, 5, 14925, 609, 4, 741, 19777, 12, 29, 26661, 4484, 12, 7415, 12, 24837, 4, 31494, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    answer_type_ids: [0]
    passage_start_t: 0
    start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    labels: [[ 0  0]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
    id: [14, 2, 0]
    seq_2_start_t: 15
    span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
Preprocessing Dataset /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json:  30%|██▉       | 190/644 [00:40<00:45,  9.92 Dicts/s]
Shreyanand commented 2 years ago

@MichaelTiemannOSC I was able to reproduce the same error with the new dataset that you tried. We didn't see this error when we ran the notebook with ESG reports. I will look into the annotations, infer relevance outputs, the training validation split code, and the pre-processing code to find the root cause. It could be a processing issue with some annotation item, or less likely a resource limitation issue. I'll try to get pod logs as well.

07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -   
Loading data into the data silo ... 
              ______
               |o  |   !
   __          |:`_|---'-.
  |__|______.-/ _ \-----.|       
 (o)(o)------'\ _ /     ( )      

07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -   Loading train set from: /opt/app-root/src/test_cdp2/data/squad/kpi_train_split.json 
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -   Got ya 7 parallel workers to convert 680 dictionaries to pytorch datasets (chunksize = 20)...
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -    0    0    0    0    0    0    0 
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -   /|\  /w\  /w\  /w\  /w\  /w\  /w\
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -   /'\  / \  / \  /'\  / \  /'\  /'\
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -               
Preprocessing Dataset /opt/app-root/src/test_cdp2/data/squad/kpi_train_split.json:   0%|          | 0/680 [00:00<?, ? Dicts/s]07/12/2022 18:54:32 - INFO - farm.data_handler.processor -   *** Show 2 random examples ***
07/12/2022 18:54:32 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 10-8-0
Clear Text: 
    passage_text: total scope 1 emissions increased due to change in output activities include 1 overall increase in fugitive sf6 emissions: 22,090 mt co2-e, 2 reduced process and fugitive emissions from compressor stations: 27,330 mt co2-e, 3 reduced emissions from other scope 1 sources: 350 mt co2-e, 4 increased electricity t&d line losses: 193,927 mt co2-e, and 5 increased facility electricity emissions: 21,131 mt co2-e. total percentage change is calculated as 209,469 mt/4,508,161 mt total scope 1 and scope 2 emissions from 2019 x 100 = 4.6% 
    question_text: What is the base year start date for scope 2 (market-based) emissions?
    passage_id: 0
    answers: []
Tokenized: 
    passage_start_t: 0
    passage_tokens: ['total', 'Ġscope', 'Ġ1', 'Ġemissions', 'Ġincreased', 'Ġdue', 'Ġto', 'Ġchange', 'Ġin', 'Ġoutput', 'Ġactivities', 'Ġinclude', 'Ġ1', 'Ġoverall', 'Ġincrease', 'Ġin', 'Ġfugitive', 'Ġs', 'f', '6', 'Ġemissions', ':', 'Ġ22', ',', '090', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġ2', 'Ġreduced', 'Ġprocess', 'Ġand', 'Ġfugitive', 'Ġemissions', 'Ġfrom', 'Ġcompressor', 'Ġstations', ':', 'Ġ27', ',', '330', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġ3', 'Ġreduced', 'Ġemissions', 'Ġfrom', 'Ġother', 'Ġscope', 'Ġ1', 'Ġsources', ':', 'Ġ350', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġ4', 'Ġincreased', 'Ġelectricity', 'Ġt', '&', 'd', 'Ġline', 'Ġlosses', ':', 'Ġ193', ',', '9', '27', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġand', 'Ġ5', 'Ġincreased', 'Ġfacility', 'Ġelectricity', 'Ġemissions', ':', 'Ġ21', ',', '131', 'Ġmt', 'Ġco', '2', '-', 'e', '.', 'Ġtotal', 'Ġpercentage', 'Ġchange', 'Ġis', 'Ġcalculated', 'Ġas', 'Ġ209', ',', '469', 'Ġmt', '/', '4', ',', '508', ',', '161', 'Ġmt', 'Ġtotal', 'Ġscope', 'Ġ1', 'Ġand', 'Ġscope', 'Ġ2', 'Ġemissions', 'Ġfrom', 'Ġ2019', 'Ġx', 'Ġ100', 'Ġ=', 'Ġ4', '.', '6', '%']
    passage_offsets: [0, 6, 12, 14, 24, 34, 38, 41, 48, 51, 58, 69, 77, 79, 87, 96, 99, 108, 109, 110, 112, 121, 123, 125, 126, 130, 133, 135, 136, 137, 138, 140, 142, 150, 158, 162, 171, 181, 186, 197, 205, 207, 209, 210, 214, 217, 219, 220, 221, 222, 224, 226, 234, 244, 249, 255, 261, 263, 270, 272, 276, 279, 281, 282, 283, 284, 286, 288, 298, 310, 311, 312, 314, 319, 325, 327, 330, 331, 332, 335, 338, 340, 341, 342, 343, 345, 349, 351, 361, 370, 382, 391, 393, 395, 396, 400, 403, 405, 406, 407, 408, 410, 416, 427, 434, 437, 448, 451, 454, 455, 459, 461, 462, 463, 464, 467, 468, 472, 475, 481, 487, 489, 493, 499, 501, 511, 516, 521, 523, 527, 529, 530, 531, 532]
    passage_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]
    question_tokens: ['What', 'Ġis', 'Ġthe', 'Ġbase', 'Ġyear', 'Ġstart', 'Ġdate', 'Ġfor', 'Ġscope', 'Ġ2', 'Ġ(', 'market', '-', 'based', ')', 'Ġemissions', '?']
    question_offsets: [0, 5, 8, 12, 17, 22, 28, 33, 37, 43, 45, 46, 52, 53, 58, 60, 69]
    question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0]
    answers: []
    document_offsets: [0, 6, 12, 14, 24, 34, 38, 41, 48, 51, 58, 69, 77, 79, 87, 96, 99, 108, 109, 110, 112, 121, 123, 125, 126, 130, 133, 135, 136, 137, 138, 140, 142, 150, 158, 162, 171, 181, 186, 197, 205, 207, 209, 210, 214, 217, 219, 220, 221, 222, 224, 226, 234, 244, 249, 255, 261, 263, 270, 272, 276, 279, 281, 282, 283, 284, 286, 288, 298, 310, 311, 312, 314, 319, 325, 327, 330, 331, 332, 335, 338, 340, 341, 342, 343, 345, 349, 351, 361, 370, 382, 391, 393, 395, 396, 400, 403, 405, 406, 407, 408, 410, 416, 427, 434, 437, 448, 451, 454, 455, 459, 461, 462, 463, 464, 467, 468, 472, 475, 481, 487, 489, 493, 499, 501, 511, 516, 521, 523, 527, 529, 530, 531, 532]
Features: 
    input_ids: [0, 2264, 16, 5, 1542, 76, 386, 1248, 13, 7401, 132, 36, 2989, 12, 805, 43, 5035, 116, 2, 2, 30033, 7401, 112, 5035, 1130, 528, 7, 464, 11, 4195, 1713, 680, 112, 1374, 712, 11, 27157, 579, 506, 401, 5035, 35, 820, 6, 37767, 41601, 1029, 176, 12, 242, 6, 132, 2906, 609, 8, 27157, 5035, 31, 41698, 4492, 35, 974, 6, 21190, 41601, 1029, 176, 12, 242, 6, 155, 2906, 5035, 31, 97, 7401, 112, 1715, 35, 10088, 41601, 1029, 176, 12, 242, 6, 204, 1130, 4382, 326, 947, 417, 516, 2687, 35, 29021, 6, 466, 2518, 41601, 1029, 176, 12, 242, 6, 8, 195, 1130, 2122, 4382, 5035, 35, 733, 6, 25433, 41601, 1029, 176, 12, 242, 4, 746, 3164, 464, 16, 9658, 25, 28036, 6, 37665, 41601, 73, 306, 6, 36911, 6, 28490, 41601, 746, 7401, 112, 8, 7401, 132, 5035, 31, 954, 3023, 727, 5457, 204, 4, 401, 207, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    answer_type_ids: [0]
    passage_start_t: 0
    start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    labels: [[ 0  0]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
    id: [10, 8, 0]
    seq_2_start_t: 20
    span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
07/12/2022 18:54:32 - INFO - farm.data_handler.processor -   

      .--.        _____                       _      
    .'_\/_'.     / ____|                     | |     
    '. /\ .'    | (___   __ _ _ __ ___  _ __ | | ___ 
      "||"       \___ \ / _` | '_ ` _ \| '_ \| |/ _ \ 
       || /\     ____) | (_| | | | | | | |_) | |  __/
    /\ ||//\)   |_____/ \__,_|_| |_| |_| .__/|_|\___|
   (/\||/                             |_|           
______\||/___________________________________________                     

ID: 0-0-0
Clear Text: 
    passage_text: in 2020, pg&e spent 201 million on its energy efficiency programs. this total includes 28 million for programs administered by regional energy networks/community choice aggregators rens/ccas, whose impacts count toward pg&e’s energy savings goals. these energy efficiency funds are collected from customers via public purpose program charges embedded in gas and electric rates and are therefore revenue neutral. to increase our impact, we also partner with state and local governments, community partners and third-party energy efficiency specialists. 
    question_text: What percentage of your total operational spend in the reporting year was on energy?
    passage_id: 0
    answers: []
Tokenized: 
    passage_start_t: 0
    passage_tokens: ['in', 'Ġ2020', ',', 'Ġpg', '&', 'e', 'Ġspent', 'Ġ201', 'Ġmillion', 'Ġon', 'Ġits', 'Ġenergy', 'Ġefficiency', 'Ġprograms', '.', 'Ġthis', 'Ġtotal', 'Ġincludes', 'Ġ28', 'Ġmillion', 'Ġfor', 'Ġprograms', 'Ġadministered', 'Ġby', 'Ġregional', 'Ġenergy', 'Ġnetworks', '/', 'community', 'Ġchoice', 'Ġaggreg', 'ators', 'Ġre', 'ns', '/', 'cc', 'as', ',', 'Ġwhose', 'Ġimpacts', 'Ġcount', 'Ġtoward', 'Ġpg', '&', 'e', 'âĢ', 'Ļ', 's', 'Ġenergy', 'Ġsavings', 'Ġgoals', '.', 'Ġthese', 'Ġenergy', 'Ġefficiency', 'Ġfunds', 'Ġare', 'Ġcollected', 'Ġfrom', 'Ġcustomers', 'Ġvia', 'Ġpublic', 'Ġpurpose', 'Ġprogram', 'Ġcharges', 'Ġembedded', 'Ġin', 'Ġgas', 'Ġand', 'Ġelectric', 'Ġrates', 'Ġand', 'Ġare', 'Ġtherefore', 'Ġrevenue', 'Ġneutral', '.', 'Ġto', 'Ġincrease', 'Ġour', 'Ġimpact', ',', 'Ġwe', 'Ġalso', 'Ġpartner', 'Ġwith', 'Ġstate', 'Ġand', 'Ġlocal', 'Ġgovernments', ',', 'Ġcommunity', 'Ġpartners', 'Ġand', 'Ġthird', '-', 'party', 'Ġenergy', 'Ġefficiency', 'Ġspecialists', '.']
    passage_offsets: [0, 3, 7, 9, 11, 12, 14, 20, 24, 32, 35, 39, 46, 57, 65, 67, 72, 78, 87, 90, 98, 102, 111, 124, 127, 136, 143, 151, 152, 162, 169, 175, 181, 183, 185, 186, 188, 190, 192, 198, 206, 212, 219, 221, 222, 223, 225, 226, 226, 233, 241, 246, 248, 254, 261, 272, 278, 282, 292, 297, 307, 311, 318, 326, 334, 342, 351, 354, 358, 362, 371, 377, 381, 385, 395, 403, 410, 412, 415, 424, 428, 434, 436, 439, 444, 452, 457, 463, 467, 473, 484, 486, 496, 505, 509, 514, 515, 521, 528, 539, 550]
    passage_start_of_word: [1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0]
    question_tokens: ['What', 'Ġpercentage', 'Ġof', 'Ġyour', 'Ġtotal', 'Ġoperational', 'Ġspend', 'Ġin', 'Ġthe', 'Ġreporting', 'Ġyear', 'Ġwas', 'Ġon', 'Ġenergy', '?']
    question_offsets: [0, 5, 16, 19, 24, 30, 42, 48, 51, 55, 65, 70, 74, 77, 83]
    question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
    answers: []
    document_offsets: [0, 3, 7, 9, 11, 12, 14, 20, 24, 32, 35, 39, 46, 57, 65, 67, 72, 78, 87, 90, 98, 102, 111, 124, 127, 136, 143, 151, 152, 162, 169, 175, 181, 183, 185, 186, 188, 190, 192, 198, 206, 212, 219, 221, 222, 223, 225, 226, 226, 233, 241, 246, 248, 254, 261, 272, 278, 282, 292, 297, 307, 311, 318, 326, 334, 342, 351, 354, 358, 362, 371, 377, 381, 385, 395, 403, 410, 412, 415, 424, 428, 434, 436, 439, 444, 452, 457, 463, 467, 473, 484, 486, 496, 505, 509, 514, 515, 521, 528, 539, 550]
Features: 
    input_ids: [0, 2264, 3164, 9, 110, 746, 5903, 1930, 11, 5, 2207, 76, 21, 15, 1007, 116, 2, 2, 179, 2760, 6, 47194, 947, 242, 1240, 21458, 153, 15, 63, 1007, 5838, 1767, 4, 42, 746, 1171, 971, 153, 13, 1767, 16556, 30, 2174, 1007, 4836, 73, 28746, 2031, 26683, 3629, 769, 6852, 73, 7309, 281, 6, 1060, 7342, 3212, 1706, 47194, 947, 242, 17, 27, 29, 1007, 4522, 1175, 4, 209, 1007, 5838, 1188, 32, 4786, 31, 916, 1241, 285, 3508, 586, 1103, 14224, 11, 1123, 8, 3459, 1162, 8, 32, 3891, 903, 7974, 4, 7, 712, 84, 913, 6, 52, 67, 1784, 19, 194, 8, 400, 3233, 6, 435, 2567, 8, 371, 12, 6493, 1007, 5838, 13923, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
    padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    answer_type_ids: [0]
    passage_start_t: 0
    start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    labels: [[ 0  0]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]
 [-1 -1]]
    id: [0, 0, 0]
    seq_2_start_t: 18
    span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
Preprocessing Dataset /opt/app-root/src/test_cdp2/data/squad/kpi_train_split.json:  26%|██▋       | 180/680 [00:28<00:32, 15.53 Dicts/s]
MichaelTiemannOSC commented 2 years ago

Talked with David Beßlich this morning. He suggested (1) deleting the column of row_id data (first column), and (2) the CSV reader protected double-quotes in the paragraph text by wrapping the overall cell text in curly double-quotes. I changed straight double-quotes to single-quotes, then curly double-quotes to straight double-quotes, and that got me through to training relevance OK. But then KPI extraction threw an error about missing the "company" column, which likely means that the ignored row_id column should not have been discarded.

I'm pretty sure that the curly quotes were creating a condition that was locking up kpi_extraction. I now just need to fix more things correctly so I can proceed.

MichaelTiemannOSC commented 2 years ago

Alas, the fixes I made to the file have not helped the cause.

MichaelTiemannOSC commented 2 years ago

When I interrupt the kernel, it looks like a deadlock situation:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File /usr/lib64/python3.8/multiprocessing/pool.py:851, in IMapIterator.next(self, timeout)
    850 try:
--> 851     item = self._items.popleft()
    852 except IndexError:

IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:183, in DataSilo._get_dataset(self, filename, dicts)
    182 with tqdm(total=len(dicts), unit=' Dicts', desc=desc) as pbar:
--> 183     for dataset, tensor_names in results:
    184         datasets.append(dataset)

File /usr/lib64/python3.8/multiprocessing/pool.py:856, in IMapIterator.next(self, timeout)
    855     raise StopIteration from None
--> 856 self._cond.wait(timeout)
    857 try:

File /usr/lib64/python3.8/threading.py:302, in Condition.wait(self, timeout)
    301 if timeout is None:
--> 302     waiter.acquire()
    303     gotit = True

KeyboardInterrupt: 

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
Input In [16], in <cell line: 2>()
      1 # start training
----> 2 farm_trainer.run(metric="f1")

File /opt/app-root/lib64/python3.8/site-packages/src/models/farm_trainer.py:380, in FARMTrainer.run(self, trial, metric)
    378 tokenizer = self.create_tokenizer()
    379 processor = self.create_processor(tokenizer)
--> 380 data_silo, n_batches = self.create_silo(processor)
    382 if self.training_config.run_hyp_tuning:
    383     prediction_head = self.create_head()

File /opt/app-root/lib64/python3.8/site-packages/src/models/farm_trainer.py:162, in FARMTrainer.create_silo(self, processor)
    151 def create_silo(self, processor):
    152     """Create a FARM DataSilo instance.
    153 
    154     It generates and stores PyTorch DataLoader objects for the train, dev and test datasets.
   (...)
    160     :return n_batches (int) number of batches for training
    161     """
--> 162     data_silo = DataSilo(
    163         processor=processor,
    164         batch_size=self.training_config.batch_size,
    165         distributed=self.training_config.distributed,
    166     )
    167     n_batches = len(data_silo.loaders["train"])
    168     return data_silo, n_batches

File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:112, in DataSilo.__init__(self, processor, batch_size, eval_batch_size, distributed, automatic_loading, max_multiprocessing_chunksize, max_processes, caching, cache_path)
    107         loaded_from_cache = True
    109 if not loaded_from_cache and automatic_loading:
    110     # In most cases we want to load all data automatically, but in some cases we rather want to do this
    111     # later or load from dicts instead of file (https://github.com/deepset-ai/FARM/issues/85)
--> 112     self._load_data()

File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:214, in DataSilo._load_data(self, train_dicts, dev_dicts, test_dicts)
    212     train_file = self.processor.data_dir / self.processor.train_filename
    213     logger.info("Loading train set from: {} ".format(train_file))
--> 214     self.data["train"], self.tensor_names = self._get_dataset(train_file)
    215 else:
    216     logger.info("No train set is being loaded")

File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:190, in DataSilo._get_dataset(self, filename, dicts)
    188 datasets = [d for d in datasets if d]
    189 concat_datasets = ConcatDataset(datasets)
--> 190 return concat_datasets, tensor_names

File /usr/lib64/python3.8/contextlib.py:525, in ExitStack.__exit__(self, *exc_details)
    521 try:
    522     # bare "raise exc_details[1]" replaces our carefully
    523     # set-up context
    524     fixed_ctx = exc_details[1].__context__
--> 525     raise exc_details[1]
    526 except BaseException:
    527     exc_details[1].__context__ = fixed_ctx

File /usr/lib64/python3.8/contextlib.py:510, in ExitStack.__exit__(self, *exc_details)
    508 assert is_sync
    509 try:
--> 510     if cb(*exc_details):
    511         suppressed_exc = True
    512         pending_raise = False

File /usr/lib64/python3.8/multiprocessing/pool.py:736, in Pool.__exit__(self, exc_type, exc_val, exc_tb)
    735 def __exit__(self, exc_type, exc_val, exc_tb):
--> 736     self.terminate()

File /usr/lib64/python3.8/multiprocessing/pool.py:654, in Pool.terminate(self)
    652 util.debug('terminating pool')
    653 self._state = TERMINATE
--> 654 self._terminate()

File /usr/lib64/python3.8/multiprocessing/util.py:224, in Finalize.__call__(self, wr, _finalizer_registry, sub_debug, getpid)
    221 else:
    222     sub_debug('finalizer calling %s with args %s and kwargs %s',
    223               self._callback, self._args, self._kwargs)
--> 224     res = self._callback(*self._args, **self._kwargs)
    225 self._weakref = self._callback = self._args = \
    226                 self._kwargs = self._key = None
    227 return res

File /usr/lib64/python3.8/multiprocessing/pool.py:692, in Pool._terminate_pool(cls, taskqueue, inqueue, outqueue, pool, change_notifier, worker_handler, task_handler, result_handler, cache)
    689 task_handler._state = TERMINATE
    691 util.debug('helping task handler/workers to finish')
--> 692 cls._help_stuff_finish(inqueue, task_handler, len(pool))
    694 if (not result_handler.is_alive()) and (len(cache) != 0):
    695     raise AssertionError(
    696         "Cannot have cache with result_hander not alive")

File /usr/lib64/python3.8/multiprocessing/pool.py:672, in Pool._help_stuff_finish(inqueue, task_handler, size)
    668 @staticmethod
    669 def _help_stuff_finish(inqueue, task_handler, size):
    670     # task_handler may be blocked trying to put items on inqueue
    671     util.debug('removing tasks from inqueue until task handler finished')
--> 672     inqueue._rlock.acquire()
    673     while task_handler.is_alive() and inqueue._reader.poll():
    674         inqueue._reader.recv()

KeyboardInterrupt: 
MichaelTiemannOSC commented 2 years ago

I have updated my experiments to include the latest changes from 7/14. I tried some binary searches to narrow the problem. I have found:

MichaelTiemannOSC commented 2 years ago

Hunting around I found this report: https://github.com/deepset-ai/FARM/issues/119#issuecomment-543307582

Memory pressure is a cause of deadlock, and indeed the user reported the same sort of stack trace as above when using the Docker containers default shm size, which happens to look just like this:

[1000630000@jupyterhub-nb-michaeltiemannosc ~]$ df /dev/shm/
Filesystem     1K-blocks  Used Available Use% Mounted on
shm                65536     0     65536   0% /dev/shm

The fix was to use --ipc=host or --shm=size but I don't see how to insert such options into the AICoE Dockerfile. Can we bump shm, especially for larger memory configurations?

MichaelTiemannOSC commented 2 years ago

When I run kpi_extraction, I see that almost all shm is immediately used up:

[1000630000@jupyterhub-nb-michaeltiemannosc data_handler]$ df /dev/shm/
Filesystem     1K-blocks  Used Available Use% Mounted on
shm                65536 56684      8852  87% /dev/shm
[1000630000@jupyterhub-nb-michaeltiemannosc data_handler]$ 
MichaelTiemannOSC commented 2 years ago

Based on this response: https://github.com/elyra-ai/elyra/issues/2838#issuecomment-1189323410

Can somebody look at adapting the OpenShift pattern to the Kustomizations we use?

@HumairAK @rynofinn