Relaxed logging of data

groupsky commented 7 years ago

I was thinking about the current approach - logging only active hunts if they are initiated right now in order to minimize bad data, but that still leaves some edge cases where bad data could be entered. One example is mice that reset state - there is no info what was the state before the catch. Current approach is to hardcode these edge case and reset state as it was before the catch.

If hypothetically instead of dropping hunts where some information is uncertain - we save them and use some analysis to interpolate missing/uncertain information with some degree of confidence - that would allow for faster build up of data.

As far as I saw there is a per user unique and sequential log id for each journal entry that can help to determine if there are missing entries and if possible to populate them. This will provide at the minimum information about loot drops, and in some cases even information about population distribution. It's even possible to populate these entries when opening other users' profile. Combined with observed trap setup between several hunts it should possible to determine with great confidence the cheese and charms used if they were not changed - i.e. the number will decrease with the number of hunts.

Overall reconstructing a complete information from past hunts will take a lot of effort to implement, but even incomplete data may still be useful.

logicalup commented 7 years ago

First of all thanks for the idea. I do have some observations:

what are we trying to solve here? Slow collection of data? I don't think it's that slow anymore. This depends on users opinion. While I agree that faster is better, it is not a great need at this time unless you want a little more precision in my humble opinion.

This would take big changes, as I built my collection around required things for a hunt: trap, base, cheese. With API possibly coming out I'm not sure this is worth the effort.
Incomplete data also presents a security risk, as it is harder to verify if the hunt was spoofed or not.
Determining trap, base, charm, cheese, is impossible as far as I know, as the user might have changed it at any time without any journal entry. Observing trap setup between two edge entries would assume that the user didn't change it in the meantime, which is a huge assumption. I'd rather have fewer entries be true than many more possibly false ones.
Also right now my limit is not the collection methods, but the server and what I collect.

groupsky commented 6 years ago

what are we trying to solve here? Slow collection of data? I don't think it's that slow anymore. This depends on users opinion. While I agree that faster is better, it is not a great need at this time unless you want a little more precision in my humble opinion.

I am currently implementing loot drops to tsitu tools and the lack of data made me think how can that be improved. You are right that this just needs time and my suggestion is neither quick nor easy feature.

This would take big changes, as I built my collection around required things for a hunt: trap, base, cheese. With API possibly coming out I'm not sure this is worth the effort.

Agreed, most likely the data will be accumulated faster than such a change would be implemented.

Incomplete data also presents a security risk, as it is harder to verify if the hunt was spoofed or not.

I think the risk of someone deliberately pushing fake hunts is pretty low, but the argument is right, with lesser data it's even harder to spot such actions.

Determining trap, base, charm, cheese, is impossible as far as I know, as the user might have changed it at any time without any journal entry. Observing trap setup between two edge entries would assume that the user didn't change it in the meantime, which is a huge assumption. I'd rather have fewer entries be true than many more possibly false ones.

My thoughts were about approximating these values with some relative confidence, but probably not worth the effort.

Also right now my limit is not the collection methods, but the server and what I collect.

I'll try to give a hand here and there - mostly on the extension side and frontend, as php is not my favorite language.

tehhowch commented 5 years ago

With latest changes, we have 3 points in the timeframe: page user, prehunt user, and posthunt user.

page user is likely the most incorrect, since it is not affected by changes to MH by other tabs, browsers, mobile app, trap checks, or friend activities. Prehunt user is almost always correct, but there is a small chance of setup deviation if the user is malicious. Posthunt user is subject to standard issues of possibly unknown cheese, charm, and possibly wrong quest, stage, or location information.

I think before we try to add per-journal interpolation into the page and prehunt user object differences, we need to shore up the validation between prehunt and posthunt user to check for malicious actions. Then comes trying to reconstruct the page user based on the prehunt snapshot and all the journals that came back in the hunt sequence. If we can reliably do that, we could enable additional hunt logging on an area-by-area basis.

logicalup commented 5 years ago

agreed, i do not want to use page user as that one can be easily modified by malicious hunter and out of date, etc. I'd rather use server responses (prehunt and post hunt user) (which also could be spoofed by malicious hunter, but not as easily, and can't be out of date).

m-h-c-t / mh-helper-extension

Relaxed logging of data #52