Closed fquellec closed 2 years ago
In the current Twitter database we have twitterAds
and twitterCriteria
tables. I suggest we keep with the SQL convention of using singular and CamelCased names for tables: TwitterAd
and TwitterCriterion
Valentin: Ok. I'll do that for Twitter and show you on Hestia-dev for confirmation just to make sure I'm on the right track
Andreas: ok I haven't looked at it closely but I see some complications in the twitter pipeline, there's an if statement, and this line:
Math.abs(d3.timeSecond.count(new Date(ad.time), new Date(time))) < 120
It would be nice if you find a simple way to make this configurable, but I suspect that things like these would remain in javascript. Maybe the name of the javascript function could be put in the manifest? I'm just thinking out loud, don't know the best solution.
Valentin: yeah it's a 4 line if statement. I thought these should be in JS since it is part of the code to feed the data into the tables
Andreas: And the targetingCriteria table is created from nested data in the impressions. It's not trivial
Valentin: so if I understand, you also want this part to be configured instead of hardcoded?
Andreas: If it makes sense, make everything configurable.
Valentin:
And the targetingCriteria table is created from nested data in the impressions. It's not trivial
you mean the twitterCriteria
table?
Andreas: yes It's possible that the twitter example is too complicated to make entirely configurable in json The goal is to have something that we can use for other experiences. I guess there will be a point where we say that things get so complicated we need to do everything in javascript It would be nice to be able to create a first version of an experience just with json, so it can be done by someone who is not a coder. Then if it gets too complicated, we involve a programmer
Valentin:
The goal is to have something that we can use for other experiences. I guess there will be a point where we say that things get so complicated we need to do everything in javascript
You mean going back to doing everything in javascript? Then what's the point of this work? Or you just mean doing part of it or only for some experiences in javascript?
Andreas: I hope we're not going to reach this point for every experience.
Valentin: question: does the current implementation support primary/foreign keys?
Andreas: I don't know. Florian wrote a layer above sqlite, I don't know if he accepts foreign keys. If he does, I don't know if it's used somewhere ok, looking at sql.js, we don't support foreign keys
Valentin: Hmm ok is there a way to configure autoincremented ids?
Andreas: we do use joins in sql queries for twitter. I guess they're fast enough without foreign keys
is there a way to configure autoincremented ids?
Doesn't look like there is
Valentin: Ok. The data feeding of the Twitter database is complicated. It is not straightforward because there are some relations like, we try to link engagements with impressions
Andreas: I wonder if this could be simplified by making two tables with a foreign key instead of that
Valentin: so perhaps, as a first step, I could just focus on table definitions and find out how to pass the db as an argument to the databaseBuilder instead of creating the db in the databaseBuilder
I wonder if this could be simplified by making two tables with a foreign key instead of that
yes, it could be, but that is not supported in sql.js
Andreas: We can change sql.js, it's pretty barebones right now. I wonder if we even need it as a layer above sqlite. I would have tried to just create functions that generate sql instead of hiding it, but I didn't have the opportunity to ask Florian why he ended up doing things this way
so perhaps, as a first step, I could just focus on table definitions and find out how to pass the db as an argument to the databaseBuilder
Sounds like a good first step, yes try to keep the configuration as short as possible
In the following segment, we process all the ads and set engagement
to 1
if the engagement matches the ad and it happens within 120 seconds of the impression. I have a couple of questions: Why are we not counting the engagements? Why do we need to process all the ads, would it not be enough to find the matching ad with Array.find()
(since I expect only at most 1 ad will match)?
engagements.forEach(v => {
const tweetId = v.promotedTweetInfo?.tweetId ?? null
const advertiserName = v.advertiserInfo?.advertiserName ?? null
const displayLocation = v.displayLocation ?? null
const time = v.impressionTime ?? null
adsItems.forEach(ad => {
if (
ad.tweetId === tweetId &&
ad.advertiserName === advertiserName &&
ad.displayLocation === displayLocation &&
Math.abs(d3.timeSecond.count(new Date(ad.time), new Date(time))) < 120
) {
ad.engagement = 1
}
})
})
Looking at the code, I think you are right about both, we would like to count the engagements and use Array.find(). What concerns me here is also the "trick" we use to match engagements and impressions, I'll see if there is a better way...
What concerns me here is also the "trick" we use to match engagements and impressions, I'll see if there is a better way...
@andreaskundig told me sql.js
doesn't support foreign keys, so storing engagements in a separate table doesn't seem to be a viable option unless we use a different engine. One thing I see is that we could potentially exploit jsonpath-plus
by concatenating the two files (impressions and engagments) into a single json that we can manipulate with JSONPath. Then we can possibly connect engagements and impressions with a complex JSONPath. In that case though, we would probably not be able to do the 120 sec check.
@andreaskundig told me
sql.js
doesn't support foreign keys
I'm not sure that's correct, see for instance, https://github.com/sql-js/sql.js/issues/221 It makes sense since sql.js is a JS implementation of SQLite which indeed supports foreign keys.
We have changed the way we can create an experience from a manifest several times, first with Sparql, then with custom pipelines and now with SQLite. These changes have resulted in a lot of logic being left out of the manifest, we need a way to standardize this and give the power back to a single JSON configuration file in order to facilitate the creation of experiences.
A standard tool we use in the project to navigate through JSON files are "Accessors". They allow us to easily specify parts of data points in a way that many people can understand. They are defined as follow:
So this issue aims to use this type of data structure to create a SQL database in the manifest, which we could then query to generate graphs. Then once done, convert all the experiences to use this new version of the configuration (this could be in another issue). Here is an example of what each manifest should look like: