Open MichaelTiemannOSC opened 2 years ago
Copying @JeremyGohBNP @andraNew for visibility.
Same problem, but locks up differently, each time I run it:
07/12/2022 14:22:05 - INFO - src.models.qa_farm_trainer - Loading the /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train.json data and splitting to train and val...
07/12/2022 14:22:06 - INFO - farm.utils - device: cuda n_gpu: 1, distributed training: False, automatic mixed precision training: True
07/12/2022 14:22:06 - INFO - farm.modeling.tokenization - Loading tokenizer of type 'RobertaTokenizer'
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -
Loading data into the data silo ...
______
|o | !
__ |:`_|---'-.
|__|______.-/ _ \-----.|
(o)(o)------'\ _ / ( )
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo - Loading train set from: /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo - Got ya 7 parallel workers to convert 644 dictionaries to pytorch datasets (chunksize = 19)...
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo - 0 0 0 0 0 0 0
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo - /w\ /|\ /w\ /|\ /w\ /w\ /w\
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo - / \ /'\ / \ /'\ /'\ /'\ / \
07/12/2022 14:22:06 - INFO - farm.data_handler.data_silo -
Preprocessing Dataset /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json: 0%| | 0/644 [00:00<?, ? Dicts/s]07/12/2022 14:22:08 - INFO - farm.data_handler.processor - *** Show 2 random examples ***
07/12/2022 14:22:08 - INFO - farm.data_handler.processor -
.--. _____ _
.'_\/_'. / ____| | |
'. /\ .' | (___ __ _ _ __ ___ _ __ | | ___
"||" \___ \ / _` | '_ ` _ \| '_ \| |/ _ \
|| /\ ____) | (_| | | | | | | |_) | | __/
/\ ||//\) |_____/ \__,_|_| |_| |_| .__/|_|\___|
(/\||/ |_|
______\||/___________________________________________
ID: 15-8-0
Clear Text:
passage_text: allocation level detail we allocate the emissions for requesting companies through a market value approach. the co2 emissions are scope 1 emissions for crop science.
question_text: Break down your total gross global Scope 1 emissions by business activity.
passage_id: 0
answers: []
Tokenized:
passage_start_t: 0
passage_tokens: ['all', 'ocation', 'Ġlevel', 'Ġdetail', 'Ġwe', 'Ġallocate', 'Ġthe', 'Ġemissions', 'Ġfor', 'Ġrequesting', 'Ġcompanies', 'Ġthrough', 'Ġa', 'Ġmarket', 'Ġvalue', 'Ġapproach', '.', 'Ġthe', 'Ġco', '2', 'Ġemissions', 'Ġare', 'Ġscope', 'Ġ1', 'Ġemissions', 'Ġfor', 'Ġcrop', 'Ġscience', '.']
passage_offsets: [0, 3, 11, 17, 24, 27, 36, 40, 50, 54, 65, 75, 83, 85, 92, 98, 106, 108, 112, 114, 116, 126, 130, 136, 138, 148, 152, 157, 164]
passage_start_of_word: [1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0]
question_tokens: ['Break', 'Ġdown', 'Ġyour', 'Ġtotal', 'Ġgross', 'Ġglobal', 'ĠScope', 'Ġ1', 'Ġemissions', 'Ġby', 'Ġbusiness', 'Ġactivity', '.']
question_offsets: [0, 6, 11, 16, 22, 28, 35, 41, 43, 53, 56, 65, 73]
question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
answers: []
document_offsets: [0, 3, 11, 17, 24, 27, 36, 40, 50, 54, 65, 75, 83, 85, 92, 98, 106, 108, 112, 114, 116, 126, 130, 136, 138, 148, 152, 157, 164]
Features:
input_ids: [0, 39539, 159, 110, 746, 4200, 720, 30108, 112, 5035, 30, 265, 1940, 4, 2, 2, 1250, 15644, 672, 4617, 52, 25915, 5, 5035, 13, 14030, 451, 149, 10, 210, 923, 1548, 4, 5, 1029, 176, 5035, 32, 7401, 112, 5035, 13, 6792, 2866, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0]
answer_type_ids: [0]
passage_start_t: 0
start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
labels: [[ 0 0]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]]
id: [15, 8, 0]
seq_2_start_t: 16
span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
07/12/2022 14:22:08 - INFO - farm.data_handler.processor -
.--. _____ _
.'_\/_'. / ____| | |
'. /\ .' | (___ __ _ _ __ ___ _ __ | | ___
"||" \___ \ / _` | '_ ` _ \| '_ \| |/ _ \
|| /\ ____) | (_| | | | | | | |_) | | __/
/\ ||//\) |_____/ \__,_|_| |_| |_| .__/|_|\___|
(/\||/ |_|
______\||/___________________________________________
ID: 14-2-0
Clear Text:
passage_text: specific ghg emissions emissions intensity for the current and the previous reporting year are described within the sustainability report, which is verified with a limited assurance by deloitte. thus, they are included in the verification process. bayer-sustainability-report-2020.pdf
question_text: What is the base year end date for scope 1 emissions?
passage_id: 0
answers: []
Tokenized:
passage_start_t: 0
passage_tokens: ['specific', 'Ġgh', 'g', 'Ġemissions', 'Ġemissions', 'Ġintensity', 'Ġfor', 'Ġthe', 'Ġcurrent', 'Ġand', 'Ġthe', 'Ġprevious', 'Ġreporting', 'Ġyear', 'Ġare', 'Ġdescribed', 'Ġwithin', 'Ġthe', 'Ġsustainability', 'Ġreport', ',', 'Ġwhich', 'Ġis', 'Ġverified', 'Ġwith', 'Ġa', 'Ġlimited', 'Ġassurance', 'Ġby', 'Ġdel', 'o', 'itte', '.', 'Ġthus', ',', 'Ġthey', 'Ġare', 'Ġincluded', 'Ġin', 'Ġthe', 'Ġverification', 'Ġprocess', '.', 'Ġb', 'ayer', '-', 's', 'ustain', 'ability', '-', 'report', '-', '2020', '.', 'pdf']
passage_offsets: [0, 9, 11, 13, 23, 33, 43, 47, 51, 59, 63, 67, 76, 86, 91, 95, 105, 112, 116, 131, 137, 139, 145, 148, 157, 162, 164, 172, 182, 185, 188, 189, 193, 195, 199, 201, 206, 210, 219, 222, 226, 239, 246, 248, 249, 253, 254, 255, 261, 268, 269, 275, 276, 280, 281]
passage_start_of_word: [1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
question_tokens: ['What', 'Ġis', 'Ġthe', 'Ġbase', 'Ġyear', 'Ġend', 'Ġdate', 'Ġfor', 'Ġscope', 'Ġ1', 'Ġemissions', '?']
question_offsets: [0, 5, 8, 12, 17, 22, 26, 31, 35, 41, 43, 52]
question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
answers: []
document_offsets: [0, 9, 11, 13, 23, 33, 43, 47, 51, 59, 63, 67, 76, 86, 91, 95, 105, 112, 116, 131, 137, 139, 145, 148, 157, 162, 164, 172, 182, 185, 188, 189, 193, 195, 199, 201, 206, 210, 219, 222, 226, 239, 246, 248, 249, 253, 254, 255, 261, 268, 269, 275, 276, 280, 281]
Features:
input_ids: [0, 2264, 16, 5, 1542, 76, 253, 1248, 13, 7401, 112, 5035, 116, 2, 2, 14175, 34648, 571, 5035, 5035, 10603, 13, 5, 595, 8, 5, 986, 2207, 76, 32, 1602, 624, 5, 11128, 266, 6, 61, 16, 13031, 19, 10, 1804, 15492, 30, 2424, 139, 13537, 4, 4634, 6, 51, 32, 1165, 11, 5, 14925, 609, 4, 741, 19777, 12, 29, 26661, 4484, 12, 7415, 12, 24837, 4, 31494, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0]
answer_type_ids: [0]
passage_start_t: 0
start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
labels: [[ 0 0]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]]
id: [14, 2, 0]
seq_2_start_t: 15
span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
Preprocessing Dataset /opt/app-root/src/aicoe-osc-demo/data/squad/kpi_train_split.json: 30%|██▉ | 190/644 [00:40<00:45, 9.92 Dicts/s]
@MichaelTiemannOSC I was able to reproduce the same error with the new dataset that you tried. We didn't see this error when we ran the notebook with ESG reports. I will look into the annotations, infer relevance outputs, the training validation split code, and the pre-processing code to find the root cause. It could be a processing issue with some annotation item, or less likely a resource limitation issue. I'll try to get pod logs as well.
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -
Loading data into the data silo ...
______
|o | !
__ |:`_|---'-.
|__|______.-/ _ \-----.|
(o)(o)------'\ _ / ( )
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo - Loading train set from: /opt/app-root/src/test_cdp2/data/squad/kpi_train_split.json
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo - Got ya 7 parallel workers to convert 680 dictionaries to pytorch datasets (chunksize = 20)...
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo - 0 0 0 0 0 0 0
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo - /|\ /w\ /w\ /w\ /w\ /w\ /w\
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo - /'\ / \ / \ /'\ / \ /'\ /'\
07/12/2022 18:54:30 - INFO - farm.data_handler.data_silo -
Preprocessing Dataset /opt/app-root/src/test_cdp2/data/squad/kpi_train_split.json: 0%| | 0/680 [00:00<?, ? Dicts/s]07/12/2022 18:54:32 - INFO - farm.data_handler.processor - *** Show 2 random examples ***
07/12/2022 18:54:32 - INFO - farm.data_handler.processor -
.--. _____ _
.'_\/_'. / ____| | |
'. /\ .' | (___ __ _ _ __ ___ _ __ | | ___
"||" \___ \ / _` | '_ ` _ \| '_ \| |/ _ \
|| /\ ____) | (_| | | | | | | |_) | | __/
/\ ||//\) |_____/ \__,_|_| |_| |_| .__/|_|\___|
(/\||/ |_|
______\||/___________________________________________
ID: 10-8-0
Clear Text:
passage_text: total scope 1 emissions increased due to change in output activities include 1 overall increase in fugitive sf6 emissions: 22,090 mt co2-e, 2 reduced process and fugitive emissions from compressor stations: 27,330 mt co2-e, 3 reduced emissions from other scope 1 sources: 350 mt co2-e, 4 increased electricity t&d line losses: 193,927 mt co2-e, and 5 increased facility electricity emissions: 21,131 mt co2-e. total percentage change is calculated as 209,469 mt/4,508,161 mt total scope 1 and scope 2 emissions from 2019 x 100 = 4.6%
question_text: What is the base year start date for scope 2 (market-based) emissions?
passage_id: 0
answers: []
Tokenized:
passage_start_t: 0
passage_tokens: ['total', 'Ġscope', 'Ġ1', 'Ġemissions', 'Ġincreased', 'Ġdue', 'Ġto', 'Ġchange', 'Ġin', 'Ġoutput', 'Ġactivities', 'Ġinclude', 'Ġ1', 'Ġoverall', 'Ġincrease', 'Ġin', 'Ġfugitive', 'Ġs', 'f', '6', 'Ġemissions', ':', 'Ġ22', ',', '090', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġ2', 'Ġreduced', 'Ġprocess', 'Ġand', 'Ġfugitive', 'Ġemissions', 'Ġfrom', 'Ġcompressor', 'Ġstations', ':', 'Ġ27', ',', '330', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġ3', 'Ġreduced', 'Ġemissions', 'Ġfrom', 'Ġother', 'Ġscope', 'Ġ1', 'Ġsources', ':', 'Ġ350', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġ4', 'Ġincreased', 'Ġelectricity', 'Ġt', '&', 'd', 'Ġline', 'Ġlosses', ':', 'Ġ193', ',', '9', '27', 'Ġmt', 'Ġco', '2', '-', 'e', ',', 'Ġand', 'Ġ5', 'Ġincreased', 'Ġfacility', 'Ġelectricity', 'Ġemissions', ':', 'Ġ21', ',', '131', 'Ġmt', 'Ġco', '2', '-', 'e', '.', 'Ġtotal', 'Ġpercentage', 'Ġchange', 'Ġis', 'Ġcalculated', 'Ġas', 'Ġ209', ',', '469', 'Ġmt', '/', '4', ',', '508', ',', '161', 'Ġmt', 'Ġtotal', 'Ġscope', 'Ġ1', 'Ġand', 'Ġscope', 'Ġ2', 'Ġemissions', 'Ġfrom', 'Ġ2019', 'Ġx', 'Ġ100', 'Ġ=', 'Ġ4', '.', '6', '%']
passage_offsets: [0, 6, 12, 14, 24, 34, 38, 41, 48, 51, 58, 69, 77, 79, 87, 96, 99, 108, 109, 110, 112, 121, 123, 125, 126, 130, 133, 135, 136, 137, 138, 140, 142, 150, 158, 162, 171, 181, 186, 197, 205, 207, 209, 210, 214, 217, 219, 220, 221, 222, 224, 226, 234, 244, 249, 255, 261, 263, 270, 272, 276, 279, 281, 282, 283, 284, 286, 288, 298, 310, 311, 312, 314, 319, 325, 327, 330, 331, 332, 335, 338, 340, 341, 342, 343, 345, 349, 351, 361, 370, 382, 391, 393, 395, 396, 400, 403, 405, 406, 407, 408, 410, 416, 427, 434, 437, 448, 451, 454, 455, 459, 461, 462, 463, 464, 467, 468, 472, 475, 481, 487, 489, 493, 499, 501, 511, 516, 521, 523, 527, 529, 530, 531, 532]
passage_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]
question_tokens: ['What', 'Ġis', 'Ġthe', 'Ġbase', 'Ġyear', 'Ġstart', 'Ġdate', 'Ġfor', 'Ġscope', 'Ġ2', 'Ġ(', 'market', '-', 'based', ')', 'Ġemissions', '?']
question_offsets: [0, 5, 8, 12, 17, 22, 28, 33, 37, 43, 45, 46, 52, 53, 58, 60, 69]
question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0]
answers: []
document_offsets: [0, 6, 12, 14, 24, 34, 38, 41, 48, 51, 58, 69, 77, 79, 87, 96, 99, 108, 109, 110, 112, 121, 123, 125, 126, 130, 133, 135, 136, 137, 138, 140, 142, 150, 158, 162, 171, 181, 186, 197, 205, 207, 209, 210, 214, 217, 219, 220, 221, 222, 224, 226, 234, 244, 249, 255, 261, 263, 270, 272, 276, 279, 281, 282, 283, 284, 286, 288, 298, 310, 311, 312, 314, 319, 325, 327, 330, 331, 332, 335, 338, 340, 341, 342, 343, 345, 349, 351, 361, 370, 382, 391, 393, 395, 396, 400, 403, 405, 406, 407, 408, 410, 416, 427, 434, 437, 448, 451, 454, 455, 459, 461, 462, 463, 464, 467, 468, 472, 475, 481, 487, 489, 493, 499, 501, 511, 516, 521, 523, 527, 529, 530, 531, 532]
Features:
input_ids: [0, 2264, 16, 5, 1542, 76, 386, 1248, 13, 7401, 132, 36, 2989, 12, 805, 43, 5035, 116, 2, 2, 30033, 7401, 112, 5035, 1130, 528, 7, 464, 11, 4195, 1713, 680, 112, 1374, 712, 11, 27157, 579, 506, 401, 5035, 35, 820, 6, 37767, 41601, 1029, 176, 12, 242, 6, 132, 2906, 609, 8, 27157, 5035, 31, 41698, 4492, 35, 974, 6, 21190, 41601, 1029, 176, 12, 242, 6, 155, 2906, 5035, 31, 97, 7401, 112, 1715, 35, 10088, 41601, 1029, 176, 12, 242, 6, 204, 1130, 4382, 326, 947, 417, 516, 2687, 35, 29021, 6, 466, 2518, 41601, 1029, 176, 12, 242, 6, 8, 195, 1130, 2122, 4382, 5035, 35, 733, 6, 25433, 41601, 1029, 176, 12, 242, 4, 746, 3164, 464, 16, 9658, 25, 28036, 6, 37665, 41601, 73, 306, 6, 36911, 6, 28490, 41601, 746, 7401, 112, 8, 7401, 132, 5035, 31, 954, 3023, 727, 5457, 204, 4, 401, 207, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0]
answer_type_ids: [0]
passage_start_t: 0
start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
labels: [[ 0 0]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]]
id: [10, 8, 0]
seq_2_start_t: 20
span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
07/12/2022 18:54:32 - INFO - farm.data_handler.processor -
.--. _____ _
.'_\/_'. / ____| | |
'. /\ .' | (___ __ _ _ __ ___ _ __ | | ___
"||" \___ \ / _` | '_ ` _ \| '_ \| |/ _ \
|| /\ ____) | (_| | | | | | | |_) | | __/
/\ ||//\) |_____/ \__,_|_| |_| |_| .__/|_|\___|
(/\||/ |_|
______\||/___________________________________________
ID: 0-0-0
Clear Text:
passage_text: in 2020, pg&e spent 201 million on its energy efficiency programs. this total includes 28 million for programs administered by regional energy networks/community choice aggregators rens/ccas, whose impacts count toward pg&e’s energy savings goals. these energy efficiency funds are collected from customers via public purpose program charges embedded in gas and electric rates and are therefore revenue neutral. to increase our impact, we also partner with state and local governments, community partners and third-party energy efficiency specialists.
question_text: What percentage of your total operational spend in the reporting year was on energy?
passage_id: 0
answers: []
Tokenized:
passage_start_t: 0
passage_tokens: ['in', 'Ġ2020', ',', 'Ġpg', '&', 'e', 'Ġspent', 'Ġ201', 'Ġmillion', 'Ġon', 'Ġits', 'Ġenergy', 'Ġefficiency', 'Ġprograms', '.', 'Ġthis', 'Ġtotal', 'Ġincludes', 'Ġ28', 'Ġmillion', 'Ġfor', 'Ġprograms', 'Ġadministered', 'Ġby', 'Ġregional', 'Ġenergy', 'Ġnetworks', '/', 'community', 'Ġchoice', 'Ġaggreg', 'ators', 'Ġre', 'ns', '/', 'cc', 'as', ',', 'Ġwhose', 'Ġimpacts', 'Ġcount', 'Ġtoward', 'Ġpg', '&', 'e', 'âĢ', 'Ļ', 's', 'Ġenergy', 'Ġsavings', 'Ġgoals', '.', 'Ġthese', 'Ġenergy', 'Ġefficiency', 'Ġfunds', 'Ġare', 'Ġcollected', 'Ġfrom', 'Ġcustomers', 'Ġvia', 'Ġpublic', 'Ġpurpose', 'Ġprogram', 'Ġcharges', 'Ġembedded', 'Ġin', 'Ġgas', 'Ġand', 'Ġelectric', 'Ġrates', 'Ġand', 'Ġare', 'Ġtherefore', 'Ġrevenue', 'Ġneutral', '.', 'Ġto', 'Ġincrease', 'Ġour', 'Ġimpact', ',', 'Ġwe', 'Ġalso', 'Ġpartner', 'Ġwith', 'Ġstate', 'Ġand', 'Ġlocal', 'Ġgovernments', ',', 'Ġcommunity', 'Ġpartners', 'Ġand', 'Ġthird', '-', 'party', 'Ġenergy', 'Ġefficiency', 'Ġspecialists', '.']
passage_offsets: [0, 3, 7, 9, 11, 12, 14, 20, 24, 32, 35, 39, 46, 57, 65, 67, 72, 78, 87, 90, 98, 102, 111, 124, 127, 136, 143, 151, 152, 162, 169, 175, 181, 183, 185, 186, 188, 190, 192, 198, 206, 212, 219, 221, 222, 223, 225, 226, 226, 233, 241, 246, 248, 254, 261, 272, 278, 282, 292, 297, 307, 311, 318, 326, 334, 342, 351, 354, 358, 362, 371, 377, 381, 385, 395, 403, 410, 412, 415, 424, 428, 434, 436, 439, 444, 452, 457, 463, 467, 473, 484, 486, 496, 505, 509, 514, 515, 521, 528, 539, 550]
passage_start_of_word: [1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0]
question_tokens: ['What', 'Ġpercentage', 'Ġof', 'Ġyour', 'Ġtotal', 'Ġoperational', 'Ġspend', 'Ġin', 'Ġthe', 'Ġreporting', 'Ġyear', 'Ġwas', 'Ġon', 'Ġenergy', '?']
question_offsets: [0, 5, 16, 19, 24, 30, 42, 48, 51, 55, 65, 70, 74, 77, 83]
question_start_of_word: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
answers: []
document_offsets: [0, 3, 7, 9, 11, 12, 14, 20, 24, 32, 35, 39, 46, 57, 65, 67, 72, 78, 87, 90, 98, 102, 111, 124, 127, 136, 143, 151, 152, 162, 169, 175, 181, 183, 185, 186, 188, 190, 192, 198, 206, 212, 219, 221, 222, 223, 225, 226, 226, 233, 241, 246, 248, 254, 261, 272, 278, 282, 292, 297, 307, 311, 318, 326, 334, 342, 351, 354, 358, 362, 371, 377, 381, 385, 395, 403, 410, 412, 415, 424, 428, 434, 436, 439, 444, 452, 457, 463, 467, 473, 484, 486, 496, 505, 509, 514, 515, 521, 528, 539, 550]
Features:
input_ids: [0, 2264, 3164, 9, 110, 746, 5903, 1930, 11, 5, 2207, 76, 21, 15, 1007, 116, 2, 2, 179, 2760, 6, 47194, 947, 242, 1240, 21458, 153, 15, 63, 1007, 5838, 1767, 4, 42, 746, 1171, 971, 153, 13, 1767, 16556, 30, 2174, 1007, 4836, 73, 28746, 2031, 26683, 3629, 769, 6852, 73, 7309, 281, 6, 1060, 7342, 3212, 1706, 47194, 947, 242, 17, 27, 29, 1007, 4522, 1175, 4, 209, 1007, 5838, 1188, 32, 4786, 31, 916, 1241, 285, 3508, 586, 1103, 14224, 11, 1123, 8, 3459, 1162, 8, 32, 3891, 903, 7974, 4, 7, 712, 84, 913, 6, 52, 67, 1784, 19, 194, 8, 400, 3233, 6, 435, 2567, 8, 371, 12, 6493, 1007, 5838, 13923, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
padding_mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
segment_ids: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0]
answer_type_ids: [0]
passage_start_t: 0
start_of_word: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
labels: [[ 0 0]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]
[-1 -1]]
id: [0, 0, 0]
seq_2_start_t: 18
span_mask: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
_____________________________________________________
Preprocessing Dataset /opt/app-root/src/test_cdp2/data/squad/kpi_train_split.json: 26%|██▋ | 180/680 [00:28<00:32, 15.53 Dicts/s]
Talked with David Beßlich this morning. He suggested (1) deleting the column of row_id data (first column), and (2) the CSV reader protected double-quotes in the paragraph text by wrapping the overall cell text in curly double-quotes. I changed straight double-quotes to single-quotes, then curly double-quotes to straight double-quotes, and that got me through to training relevance OK. But then KPI extraction threw an error about missing the "company" column, which likely means that the ignored row_id column should not have been discarded.
I'm pretty sure that the curly quotes were creating a condition that was locking up kpi_extraction. I now just need to fix more things correctly so I can proceed.
Alas, the fixes I made to the file have not helped the cause.
When I interrupt the kernel, it looks like a deadlock situation:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
File /usr/lib64/python3.8/multiprocessing/pool.py:851, in IMapIterator.next(self, timeout)
850 try:
--> 851 item = self._items.popleft()
852 except IndexError:
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
KeyboardInterrupt Traceback (most recent call last)
File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:183, in DataSilo._get_dataset(self, filename, dicts)
182 with tqdm(total=len(dicts), unit=' Dicts', desc=desc) as pbar:
--> 183 for dataset, tensor_names in results:
184 datasets.append(dataset)
File /usr/lib64/python3.8/multiprocessing/pool.py:856, in IMapIterator.next(self, timeout)
855 raise StopIteration from None
--> 856 self._cond.wait(timeout)
857 try:
File /usr/lib64/python3.8/threading.py:302, in Condition.wait(self, timeout)
301 if timeout is None:
--> 302 waiter.acquire()
303 gotit = True
KeyboardInterrupt:
During handling of the above exception, another exception occurred:
KeyboardInterrupt Traceback (most recent call last)
Input In [16], in <cell line: 2>()
1 # start training
----> 2 farm_trainer.run(metric="f1")
File /opt/app-root/lib64/python3.8/site-packages/src/models/farm_trainer.py:380, in FARMTrainer.run(self, trial, metric)
378 tokenizer = self.create_tokenizer()
379 processor = self.create_processor(tokenizer)
--> 380 data_silo, n_batches = self.create_silo(processor)
382 if self.training_config.run_hyp_tuning:
383 prediction_head = self.create_head()
File /opt/app-root/lib64/python3.8/site-packages/src/models/farm_trainer.py:162, in FARMTrainer.create_silo(self, processor)
151 def create_silo(self, processor):
152 """Create a FARM DataSilo instance.
153
154 It generates and stores PyTorch DataLoader objects for the train, dev and test datasets.
(...)
160 :return n_batches (int) number of batches for training
161 """
--> 162 data_silo = DataSilo(
163 processor=processor,
164 batch_size=self.training_config.batch_size,
165 distributed=self.training_config.distributed,
166 )
167 n_batches = len(data_silo.loaders["train"])
168 return data_silo, n_batches
File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:112, in DataSilo.__init__(self, processor, batch_size, eval_batch_size, distributed, automatic_loading, max_multiprocessing_chunksize, max_processes, caching, cache_path)
107 loaded_from_cache = True
109 if not loaded_from_cache and automatic_loading:
110 # In most cases we want to load all data automatically, but in some cases we rather want to do this
111 # later or load from dicts instead of file (https://github.com/deepset-ai/FARM/issues/85)
--> 112 self._load_data()
File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:214, in DataSilo._load_data(self, train_dicts, dev_dicts, test_dicts)
212 train_file = self.processor.data_dir / self.processor.train_filename
213 logger.info("Loading train set from: {} ".format(train_file))
--> 214 self.data["train"], self.tensor_names = self._get_dataset(train_file)
215 else:
216 logger.info("No train set is being loaded")
File /opt/app-root/lib64/python3.8/site-packages/farm/data_handler/data_silo.py:190, in DataSilo._get_dataset(self, filename, dicts)
188 datasets = [d for d in datasets if d]
189 concat_datasets = ConcatDataset(datasets)
--> 190 return concat_datasets, tensor_names
File /usr/lib64/python3.8/contextlib.py:525, in ExitStack.__exit__(self, *exc_details)
521 try:
522 # bare "raise exc_details[1]" replaces our carefully
523 # set-up context
524 fixed_ctx = exc_details[1].__context__
--> 525 raise exc_details[1]
526 except BaseException:
527 exc_details[1].__context__ = fixed_ctx
File /usr/lib64/python3.8/contextlib.py:510, in ExitStack.__exit__(self, *exc_details)
508 assert is_sync
509 try:
--> 510 if cb(*exc_details):
511 suppressed_exc = True
512 pending_raise = False
File /usr/lib64/python3.8/multiprocessing/pool.py:736, in Pool.__exit__(self, exc_type, exc_val, exc_tb)
735 def __exit__(self, exc_type, exc_val, exc_tb):
--> 736 self.terminate()
File /usr/lib64/python3.8/multiprocessing/pool.py:654, in Pool.terminate(self)
652 util.debug('terminating pool')
653 self._state = TERMINATE
--> 654 self._terminate()
File /usr/lib64/python3.8/multiprocessing/util.py:224, in Finalize.__call__(self, wr, _finalizer_registry, sub_debug, getpid)
221 else:
222 sub_debug('finalizer calling %s with args %s and kwargs %s',
223 self._callback, self._args, self._kwargs)
--> 224 res = self._callback(*self._args, **self._kwargs)
225 self._weakref = self._callback = self._args = \
226 self._kwargs = self._key = None
227 return res
File /usr/lib64/python3.8/multiprocessing/pool.py:692, in Pool._terminate_pool(cls, taskqueue, inqueue, outqueue, pool, change_notifier, worker_handler, task_handler, result_handler, cache)
689 task_handler._state = TERMINATE
691 util.debug('helping task handler/workers to finish')
--> 692 cls._help_stuff_finish(inqueue, task_handler, len(pool))
694 if (not result_handler.is_alive()) and (len(cache) != 0):
695 raise AssertionError(
696 "Cannot have cache with result_hander not alive")
File /usr/lib64/python3.8/multiprocessing/pool.py:672, in Pool._help_stuff_finish(inqueue, task_handler, size)
668 @staticmethod
669 def _help_stuff_finish(inqueue, task_handler, size):
670 # task_handler may be blocked trying to put items on inqueue
671 util.debug('removing tasks from inqueue until task handler finished')
--> 672 inqueue._rlock.acquire()
673 while task_handler.is_alive() and inqueue._reader.poll():
674 inqueue._reader.recv()
KeyboardInterrupt:
I have updated my experiments to include the latest changes from 7/14. I tried some binary searches to narrow the problem. I have found:
Hunting around I found this report: https://github.com/deepset-ai/FARM/issues/119#issuecomment-543307582
Memory pressure is a cause of deadlock, and indeed the user reported the same sort of stack trace as above when using the Docker containers default shm size, which happens to look just like this:
[1000630000@jupyterhub-nb-michaeltiemannosc ~]$ df /dev/shm/
Filesystem 1K-blocks Used Available Use% Mounted on
shm 65536 0 65536 0% /dev/shm
The fix was to use --ipc=host or --shm=size but I don't see how to insert such options into the AICoE Dockerfile. Can we bump shm, especially for larger memory configurations?
When I run kpi_extraction, I see that almost all shm is immediately used up:
[1000630000@jupyterhub-nb-michaeltiemannosc data_handler]$ df /dev/shm/
Filesystem 1K-blocks Used Available Use% Mounted on
shm 65536 56684 8852 87% /dev/shm
[1000630000@jupyterhub-nb-michaeltiemannosc data_handler]$
Based on this response: https://github.com/elyra-ai/elyra/issues/2838#issuecomment-1189323410
Can somebody look at adapting the OpenShift pattern to the Kustomizations we use?
@HumairAK @rynofinn
Describe the bug I've been adding annotations to
s3://redhat-osc-physical-landing-647521352890/test_cdp2/pipeline_run/cdp/annotations/20220709 CDP aggregated_annotations_needs_correction.xlsx
and have now managed to lock up the train_kpi_extraction notebook. Here's the output cell where progress stops:To Reproduce Steps to reproduce the behavior:
Expected behavior I expect train_kpi_extraction to complete
Screenshots If applicable, add screenshots to help explain your problem.
Additional context I did not see this error with previous versions of my annotation file (which initially contained Coca-Cola and PGE data). I later added Bayer AG and Apple, and that's when it locked up.