Open ThaDafinser opened 9 years ago
:+1: :+1: I don't think amount of bytes is a big deal nowadays and if so, when can still disable it or clear it, setup delete logs etc.
The only thing i remember, when i've done that some years ago im my custom little log/analyze table was that the $_SERVER
variable can get really huge, when including all.
So it should be limited to the currently useful parts.
maybe we could start reducing scope to store the user agent raw value in the visit, assuming the user agent is the most useful field.
we could not store the whole of _SERVER as we need to make sure privacy is respected, and that fields are properly sanitised such as IP address.
+1 for keeping raw data!
Very same direction keeping raw data there are also some other topics very intersting:
Possibility to give visits a type like "standard", "deleted", "bot" etc. https://github.com/piwik/piwik/issues/9205
Do not delete bots but make them filterable afterwards (simple switch include or ignore them) https://github.com/piwik/piwik/issues/9067
centralized list to store visitis to ignore: bots, deleted visits, spam etc. https://github.com/piwik/piwik/issues/9184
(...and storage is becoming cheaper and faster every day, but visitor count (data production) on websites tracked with piwik is not enhancing with same speed)
I added a really simple plugin to add a column with the serialized HTTP headers https://github.com/ThaDafinser/Piwik-KeepVisitorHttpRawData/blob/master/Columns/KeepVisitorHttpRawData.php#L36-L51
_NOTE_ It does not care currently about the privacy settings Nor there is already a job to reparse the headers, after an update.
It's just here for now, to get a feeling about the needed memory
The
log_visit
has currently many processed visitor dataAll those data has one common thing: They are extracted from the
$_SERVER
(or similar) request data and then the original data are lost.I think it should be possible to keep the raw data for reprocess the filling of the processed data.
Why? Lets take for example the
device
. Its value getting filled by the wonderfuldevice-detector
which gets better and better. But i'm not able after a device-detector upgrade to fill the missing values, because the original value is lost.I think of a very simple solution:
log_visit
table (LONGTEXT)Drawback: The size per entry takes a few amount of (k)bytes...
Thoughts?