snap-stanford / GreaseLM

[ICLR 2022 spotlight]GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
MIT License
228 stars 40 forks source link

Cannot reshape array of size 0 into shape (0) #4

Closed dxlong2000 closed 2 years ago

dxlong2000 commented 2 years ago

Hi Xikun @XikunZhang ,

Thanks for your great work. When I preprocessed csqa, I have met this error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/data/xuanlong/anaconda3/envs/greaselm/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/data/xuanlong/Graph2Text/GreaseLM/preprocess_utils/graph.py", line 337, in concepts_to_adj_matrices_2hop_all_pair__use_LM__Part3
    adj, concepts = concepts2adj(schema_graph)
  File "/data/xuanlong/Graph2Text/GreaseLM/preprocess_utils/graph.py", line 128, in concepts2adj
    adj = coo_matrix(adj.reshape(-1, n_node))
ValueError: cannot reshape array of size 0 into shape (0)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "preprocess.py", line 131, in <module>
    main()
  File "preprocess.py", line 125, in main
    rt_dic['func'](*rt_dic['args'])
  File "/data/xuanlong/Graph2Text/GreaseLM/preprocess_utils/graph.py", line 512, in generate_adj_data_from_grounded_concepts__use_LM
    res3 = list(tqdm(p.imap(concepts_to_adj_matrices_2hop_all_pair__use_LM__Part3, res2), total=len(res2)))
  File "/data/xuanlong/anaconda3/envs/greaselm/lib/python3.8/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/data/xuanlong/anaconda3/envs/greaselm/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
ValueError: cannot reshape array of size 0 into shape (0)

I have tried to fix it by editing the line https://github.com/snap-stanford/GreaseLM/blob/803946bba3273556c1ff2be6ad8b02850fe5972d/preprocess_utils/graph.py#L128 to just ignore the reshape method if the array has size 0:

try:
        adj = coo_matrix(adj.reshape(-1, n_node))
except:
        print("FAIL concepts2adj")

I think that I edited in an incorrect way because when running evaluation, I got this error:

points/csqa/csqa_model.pt
***** hyperparameters *****
dataset: csqa
******************************
wandb: WARNING W&B installed but not logged in.  Run `wandb login` or set the WANDB_API_KEY env variable.
ModelClass <class 'transformers.modeling_roberta.RobertaModel'>
NLP
pid: 74920
screen: 

gpu: 1

torch version: 1.8.0+cu101
torch cuda version: 10.1
cuda is available: True
cuda device count: 1
cudnn version: 7603
wandb id:  1ziiml5l
loading from checkpoint: ./checkpoints/csqa/csqa_model.pt
train_statement_path ./data//csqa/statement/train.statement.jsonl
num_choice 5
Loading sparse adj data...
loading adj matrices: 100%|███████████████████████████████████████████████████████████████████████| 48705/48705 [00:22<00:00, 2158.86it/s]
| ori_adj_len: mu 12.13 sigma 9.67 | adj_len: 13.13 | prune_rate: 0.00 | qc_num: 5.46 | ac_num: 1.54 |
Traceback (most recent call last):
  File "greaselm.py", line 606, in <module>
    main(args)
  File "greaselm.py", line 546, in main
    evaluate(args, has_test_split, devices, kg)
  File "greaselm.py", line 449, in evaluate
    dataset = load_data(args, devices, kg)
  File "greaselm.py", line 50, in load_data
    dataset = data_utils.GreaseLM_DataLoader(args.train_statements, args.train_adj,
  File "/data/xuanlong/Graph2Text/GreaseLM/utils/data_utils.py", line 121, in __init__
    assert all(len(self.train_qids) == len(self.train_adj_data[0]) == x.size(0) for x in [self.train_labels] + self.train_encoder_data + self.train_decoder_data)
AssertionError

Is it possible that you could give me some advices on how I can fix it (the first error).

Thank you & BR,

XikunZhang commented 2 years ago

What is the value of node_ids here?

dxlong2000 commented 2 years ago

Hmm by some reasons I can run it now. Let me do several tests and will open it back when I got the same error.

Thanks!