Closed Siiggyy closed 1 year ago
@Siiggyy i got the same probem here OwO
What exactly is the output you are getting vs the output you are expecting. Edit: understood it
demo_parser = DemoParser()
with open(r"D:\CSGO\ML\csgoml\2023-01-20.json", encoding="utf-8") as demo_json:
demo_data = json.load(demo_json)
demo_parser.json = demo_data
dataframe = demo_parser.parse_json_to_df()
steam_ids_df = set(dataframe["damages"]["attackerSteamID"].unique())
logging.info(steam_ids_df)
steam_ids = set()
if demo_parser.json:
damages = []
for r in demo_parser.json["gameRounds"] or []:
if r["damages"] is not None:
for d in r["damages"]:
steam_ids.add(d["attackerSteamID"])
logging.info(steam_ids)
2023-01-20 22:13:35 INFO {76561198083936288, 76561198201946624, 76561198078944032, 76561198133319168, 76561198397742272, 76561199096388144, 76561198120668208, 76561198262004176, 76561198033174672, 76561198193861488, <NA>}
2023-01-20 22:13:35 INFO {76561198201946625, 76561198397742267, 76561199096388144, 76561198033174674, 76561198120668211, 76561198262004179, 76561198193861491, 76561198083936281, 76561198133319162, 76561198078944027, None}
But that seems like more than a rounding error. For example 76561198133319162
becomes 76561198133319168
I dont think a manual check on this it the way to go as None
values can pop up in multiple places. I'll see if there is a general thing with pandas to handle this.
I think i got a fix that solves this problem. Currently what i get is:
2023-01-21 07:46:50 INFO Index(['tick', 'seconds', 'clockTime', 'attackerSteamID', 'attackerName',
'attackerTeam', 'attackerSide', 'attackerX', 'attackerY', 'attackerZ',
'attackerViewX', 'attackerViewY', 'attackerStrafe', 'victimSteamID',
'victimName', 'victimTeam', 'victimSide', 'victimX', 'victimY',
'victimZ', 'victimViewX', 'victimViewY', 'weapon', 'weaponClass',
'hpDamage', 'hpDamageTaken', 'armorDamage', 'armorDamageTaken',
'hitGroup', 'isFriendlyFire', 'distance', 'zoomLevel', 'roundNum',
'matchID', 'mapName'],
dtype='object')
2023-01-21 07:46:50 INFO tick int64
seconds float64
clockTime object
attackerSteamID Int64
attackerName object
attackerTeam object
attackerSide object
attackerX float64
attackerY float64
attackerZ float64
attackerViewX float64
attackerViewY float64
attackerStrafe object
victimSteamID Int64
victimName object
victimTeam object
victimSide object
victimX float64
victimY float64
victimZ float64
victimViewX float64
victimViewY float64
weapon object
weaponClass object
hpDamage int64
hpDamageTaken int64
armorDamage int64
armorDamageTaken int64
hitGroup object
isFriendlyFire bool
distance float64
zoomLevel float64
roundNum int64
matchID object
mapName object
dtype: object
2023-01-21 07:46:50 INFO {76561198083936288, 76561198201946624, 76561198078944032, 76561198133319168, 76561198397742272, 76561199096388144, 76561198120668208, 76561198262004176, 76561198033174672, 76561198193861488, <NA>}
2023-01-21 07:46:50 INFO {76561198201946625, 76561198397742267, 76561199096388144, 76561198033174674, 76561198120668211, 76561198262004179, 76561198193861491, 76561198083936281, 76561198133319162, 76561198078944027, None}
but if i change https://github.com/pnxenopoulos/awpy/blob/ccd9c34366bda0424bf04d3e73a12f22059333c5/awpy/parser/demoparser.py#L600 to return pd.DataFrame(damages, dtype=object)
i get
2023-01-21 07:48:32 INFO tick object
seconds object
clockTime object
attackerSteamID Int64
attackerName object
attackerTeam object
attackerSide object
attackerX object
attackerY object
attackerZ object
attackerViewX object
attackerViewY object
attackerStrafe object
victimSteamID Int64
victimName object
victimTeam object
victimSide object
victimX object
victimY object
victimZ object
victimViewX object
victimViewY object
weapon object
weaponClass object
hpDamage object
hpDamageTaken object
armorDamage object
armorDamageTaken object
hitGroup object
isFriendlyFire object
distance object
zoomLevel object
roundNum object
matchID object
mapName object
dtype: object
2023-01-21 07:48:32 INFO {76561198201946625, 76561198397742267, 76561199096388144, 76561198033174674, 76561198120668211, 76561198262004179, 76561198193861491, 76561198083936281, 76561198133319162, 76561198078944027, <NA>}
2023-01-21 07:48:32 INFO {76561198201946625, 76561198397742267, 76561199096388144, 76561198033174674, 76561198120668211, 76561198262004179, 76561198193861491, 76561198083936281, 76561198133319162, 76561198078944027, None}
so now it doesnt do any weird conversions. however the dtypes of all the columns is now object
. So if someone was making use of that they would get thrown off. @pnxenopoulos not sure how to go about this.
I dug a bit deeper and its seems that this is a known issue with pandas. It is actually using the correct nullable integer type but there is a casting to float going on behind the scene that causes this issue.
See https://github.com/pandas-dev/pandas/issues/26259 and https://github.com/pandas-dev/pandas/issues/32134 for examples.
It seems the issued were fixed literally 2 days ago in a MR https://github.com/pandas-dev/pandas/pull/50757.
Sadly version 1.5.3 was released exactly a day before. So this issue will probably get fixed in the next pandas release which we should then upgrade to.
So we should probably decide on what the best workaround is until then.
We could see a new pandas release probably in anywhere from 1-3 months from now. To address this issue, we could enforce a steamid of 0, like @Siiggyy does. I believe I might have done this before. Another option is to actually assign a steamid through some logic. For example, world damage could go to attacker and C4 damage to the bomb planter (not really sure I like this, though, plus, I don't know what causes world damage).
How about for awpy 1.2.3 we change the golang code to return an attacker steamid of 0 in world/c4 damages?
I wouldn't change the damage to go to the attacker or bomb planter since that would probably screw with a lot of statistics. My guess would be world damage is like falling of a building, maybe falldamage.
I also wouldn't try to assign these DMG events to someone.
And I also think it is fine to have no attacker steamid when the DMG is not from a character. I think bots get steamid 0 and that change would make it harder to differentiate.
I feel we should manually set it to 0 for now until there is a new pandas version that includes a fix. At tust point I think we should switch back to the current syntax.
@Siiggyy would just have to be aware that he can't relay on the steamid always being an int then.
You could also take another placeholder if bots get steamid 0. And it would be fine if it won't be an int as long as the id is correct in the end then i could still convert it afterwards.
I think for a temporary placeholder it should be fine as 0 even with the collision with the bots. although -1 (does that work?) or 1 would also be fine and maybe better. I was just referring to xenos idea to adjust it in awpy in general.
I think the ideal state would be the current one without pandas bugging out.
Similar bug happens with the Kills df my guess currently is it happens if a player disconnects while he is alive since it counts as a death (suicide and teamkill) but attacker SteamID is NaN. And with that we get the same conversion error again.
Temporary fix:
for k in r["kills"]: if k["attackerSteamID"] is None: k["attackerSteamID"] = 99
Pandas 2.0 is out: https://pypi.org/project/pandas/ https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html
Could you check that this now works without issue?
In that case we can just update the requirements.
Might need a small change to use arrow types there:
Missing values
Many pandas users must have experienced data type changing from integer to float implicitly. That's because pandas automatically converts the data type to float when missing values are introduced during calculation or include in original data:
python In [1]: pd.Series([1, 2, 3, None]) Out[1]: 0 1.0 1 2.0 2 3.0 3 NaN dtype: float64
Missing values has always been a pain in the ass because there're different types for missing values. np.nan is for floating-point numbers. None and np.nan are for object types, and pd.NaT is for date-related types.In Pandas 1.0, pd.NA was introduced to to avoid type conversion, but it needs to be specified manually by the user. Pandas has always wanted to improve in this part but has struggled to do so.
The introduction of Arrow can solve this problem perfectly: ``` In [1]: df2 = pd.DataFrame({'a':[1,2,3, None]}, dtype='int64[pyarrow]')
In [2]: df2.dtypes Out[2]: a int64[pyarrow] dtype: object
In [3]: df2 Out[3]: a 0 1 1 2 2 3 3 <NA> ```
From here: https://www.reddit.com/r/Python/comments/12b7w3y/everything_you_need_to_know_about_pandas_200/
If you create a DataFrame with parse_json_to_df the df['damages'] sometimes has a rounding error in the attackerSteamID if there is any damage done through world or C4, since the attackerSteamID for those is of Type None. And if you then create the DF from that the Steamids get messed up.
Issue happens on line 594 in file demoparser.py a fix could be to give world a custom attackerSteamID. https://github.com/pnxenopoulos/awpy/blob/ccd9c34366bda0424bf04d3e73a12f22059333c5/awpy/parser/demoparser.py#L594
I currently have it implemented that way World or C4 damage get the attackerSteamID 0.
Added my JSON for testing purposes. JSON.zip