Open tank-bohr opened 1 month ago
@tank-bohr that's a reasonable concern. Let's use some other format for serialization. Definitely not JSON or YAML since they are very verbose.
@tank-bohr JSON is very verbose, if you ask me. Our files will be much larger than now. Maybe it's not a big deal, since we are planning to ZIP them, but still...
@tank-bohr Thanks for reporting a new bug! You've earned +15 points. By reporting bugs, you help our project improve its quality. If you find anything else in the repository that doesn't look as good as you might expect, do not hesitate to report it.
one more candidate: parquet
Ruby library: https://github.com/apache/arrow/tree/main/ruby/red-parquet
@tank-bohr maybe we can invent our own format? The structure of the data that we save/load is very primitive. Shouldn't be hard to serialize/deserialize, without any C-level code.
@yegor256 I'm not a big fan of inventing new formats. The format is usually closely related to speed (see #6). Parquet, for example, is optimized for data access and search.
I have a crazy idea. What if we migrate to the SQLite format? It will solve two problems with one shot:
@tank-bohr it's not so crazy) We can use SQLite, but how will we convert our query language to SQL? What will be the design of tables? How we will organize indices?
@yegor256 I mean using SQLite as storage, not changing the queries. Then, the tables could be
id | key | value | metadata |
---|---|---|---|
84b34a87-42c2-444f-837d-7bc0ad58dab0 | repo | factbase | |
84b34a87-42c2-444f-837d-7bc0ad58dab0 | owner | yegor256 | |
84b34a87-42c2-444f-837d-7bc0ad58dab0 | type | issue-was-created | |
... | ... | ... | ... |
id | key | value | metadata |
---|---|---|---|
84b34a87-42c2-444f-837d-7bc0ad58dab0 | number | 42 | |
... | ... | ... | ... |
For each supported datatype, we have to have
value_int,
value_string
...)Indices could be
id
to be able to fetch a complete mapkey
+ value
for requests like (eq repo factbase)
Then (eq repo factbase)
will be transformed under the hood to the following SQL-query
SELECT id
FROM fact_strings
WHERE key = 'repo'
AND value = 'factbase'
@tank-bohr I would go with this structure:
fact | property | string | integer | float |
---|---|---|---|---|
10 | foo | NULL | 42 | NULL |
10 | bar | "hello" | NULL | NULL |
10 | bar | "test" | NULL | NULL |
11 | xyz | NULL | NULL | 3.14 |
This would mean this factbase:
[
{ foo: 42, bar: ["hello", "test"] }
{ xyz: 3.14 }
]
WDYT?
I have two concerns:
But it could work for sure.
See security recommendations for details.
It's used in baza. As far as I understand the code, the data was received from an HTTP request, which makes the data untrusted.
https://github.com/yegor256/factbase/blob/51d357c5d3c7411191a43c0184e0f43adba53e78/lib/factbase.rb#L218