Closed iamakulov closed 5 years ago
For the reference, I found two more places where cropping happens:
In case of JsonEventDeserializer.java
(the first snippet), the maxStringLength
property is configurable – you can change it by putting something like
collection.max-string-length = 999999
to Rakam’s config.properties
. However, strings are still cropped when they are saved to the database (see the code snippet in the previous message), so the config doesn’t help.
@iamakulov we crop the value of strings which have more characters than expected because the data collected by Rakam is used for analytical purposes. Our customers usually don't store big string blobs, instead, break down the string values and send them as new attributes such as User-Agent
values.
As you already figured out, the value is configurable in config.properties
and we intentionally made it configurable in server side. The idea is that the data is collected from the users and it's not reliable in that sense. Therefore we try to sanitize the user input as much as possible in order to be able to provide a reliable system.
Got it, thanks!
and send them as new attributes such as User-Agent values.
BTW, just in case it’s relevant: this is the solution I was doing initially, but at some point, I got into the limit of 200 custom fields per collection, so I had to start encoding data into larger strings.
For anyone affected: in the end, I solved the cropping issue by removing .substring()
branches in JsonEventDeserializer.java
and PostgresqlEventStore.java
in my own fork. Now Rakam saves strings of arbitrary length.
If you use rakam-cookbook
for deploying Rakam (this is the case with the Rakam’s AWS CloudFormation template), you can switch it to use your fork by searching for buremba/rakam
in the source code and replacing found GitHub links to your own repo.
Recently, we found out Rakam silently crops long event properties to 100 characters. So, for example, if you do something like this:
and
JSON.stringify(someHtmlNode.dataset)
turns out to be longer than 100 characters, only first 100 characters would be saved to DB.We were saving relatively large JSON-encoded objects as one of event fields, and, because of this cropping, we lost an important part of data.
Looks like this is the code that does this:
https://github.com/rakam-io/rakam/blob/656c168d78aeb3058388df22ee88f21e391eadfd/rakam-postgresql/src/main/java/org/rakam/postgresql/analysis/PostgresqlEventStore.java#L295-L297
TEXT
type which means DB storage is not an issue