pillargg / pillar_algos

Finds best timestamps to cut at
https://docs.pillar.gg/pillar_algos/
GNU General Public License v3.0
1 stars 0 forks source link

2% of calls failed with KeyError #15

Closed RusseII closed 3 years ago

RusseII commented 3 years ago

98% of 200 data sets returned successfully, 2% (13) of them threw one of the following three errors. I don't easily have the IDs of the videos that are failing but I can get them if they are absolutely needed

[ERROR] KeyError: "None of [Index(['created_at', 'updated_at', 'commenter', 'message'], dtype='object')] are in the [columns]"
Traceback (most recent call last):
  File "/var/task/handler.py", line 32, in handler
    algo1_result = algo1.run(all_messages)
  File "/var/task/pillaralgos/algo1.py", line 104, in run
    big_df = d.organize_twitch_chat(data) # fetch appropriate data
  File "/var/task/pillaralgos/helpers/data_handler.py", line 67, in organize_twitch_chat
    df = data[['created_at','updated_at','commenter','message']].add_suffix('_mess')
  File "/var/task/pandas/core/frame.py", line 3030, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/var/task/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "/var/task/pandas/core/indexing.py", line 1308, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
[ERROR] KeyError: 'emoticons'
Traceback (most recent call last):  
File "/var/task/handler.py", line 34, in handler    
    algo3 = algo3_0.run(all_messages)  
File "/var/task/pillaralgos/algo3_0.py", line 129, in run
    results, first_stamp = thalamus(big_df, min_, min_words = 5, goal='num_top_user_appears')  
File "/var/task/pillaralgos/algo3_0.py", line 13, in thalamus
    id_words = id_words_counter(big_df)  
File "/var/task/pillaralgos/algo3_0.py", line 84, in id_words_counter
    emoji = temp_df['emoticons'].apply(lambda x: 0 if type(x) == float else len(x))  
File "/var/task/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)  
File "/var/task/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
--
[ERROR] IndexError: single positional indexer is out-of-bounds
Traceback (most recent call last):
  File "/var/task/handler.py", line 33, in handler
    algo2_result = algo2.run(all_messages)
  File "/var/task/pillaralgos/algo2.py", line 118, in run
    results, first_stamp = thalamus(big_df, min_)
  File "/var/task/pillaralgos/algo2.py", line 40, in thalamus
    hour = chunk.iloc[-1,11] # col 11 is hour.
  File "/var/task/pandas/core/indexing.py", line 889, in __getitem__
    return self._getitem_tuple(key)
  File "/var/task/pandas/core/indexing.py", line 1450, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "/var/task/pandas/core/indexing.py", line 723, in _has_valid_tuple
    self._validate_key(k, i)
  File "/var/task/pandas/core/indexing.py", line 1358, in _validate_key
    self._validate_integer(key, axis)
  File "/var/task/pandas/core/indexing.py", line 1444, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
pomkos commented 3 years ago

Hi,

Yes, the IDs are pretty much always needed. It looks like these are just empty jsons? I haven't added a catch yet, but working on it for next update.

RusseII commented 3 years ago

Yes that's probably the case