systemphil / symposia

Symposia is a philosophy course platform🏺
https://symposia.sphil.xyz
Apache License 2.0
6 stars 1 forks source link

Fear of database fields or application logic? #29

Closed Firgrep closed 11 months ago

Firgrep commented 11 months ago

This is a simple question but I think it reflects a broader issue. When we want to identify certain data, we can create a logic around existing fields or add new fields to existing models. I am wondering what is generally the best long-term approach to this. Let me use an example from our app.

As part of the MDX editor, I had added a header that displays the current type of file being edited - given one could either be editing the lesson content or lesson transcript (and now we also have course details into the mix) - it is very handy for the user to know where one is. So if the user is editing the lesson content, the heading would read "Editing content of ", or in the case of lesson transcript, "Editing transcript of <title>". Here's how we currently retrieve the type:</p> <pre><code class="language-ts">/** * Utility function to extract the markdown material and type flag of the data model. * @return An array where 0 index is the markdown, and 1 the type. */ export function filterMarkdownAndType(material: DBGetMdxContentByModelIdReturnType) { let incomingMarkdown: string; let incomingType: string; if ("transcript" in material) { incomingMarkdown = material.transcript; incomingType = "transcript"; } else if ("content" in material) { incomingMarkdown = material.content; incomingType = "content"; } else { incomingType = "nothing"; incomingMarkdown = "No content available."; } return [incomingMarkdown, incomingType]; }</code></pre> <p>The function expects a data model and basically returns an array of only the mdx and type based on whether the field of the model is "content" or "transcript".</p> <p>Currently, course details is not supported, which prompted this discussion in my head, and what if we add further models down the road that also need the editor?</p> <p>So instead of having to tailor the application logic, we create a new field on the model called "type" that is checked in the editor component. No need to additional utility functions. </p> <h3>Further changes</h3> <p>Incidentally, since we now have three "mdx models" which are nearly the same, I'm thinking of changing the name of the field containing the mdx to just <code>mdx</code> on all three and have the field "type" added to each to differentiate them. </p> <p>So, just to be clear, here is the current schema:</p> <pre><code class="language-prisma">model LessonContent { id String @id @default(cuid()) lesson Lesson @relation(fields: [lessonId], references: [id]) lessonId String @unique content Bytes createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model LessonTranscript { id String @id @default(cuid()) lesson Lesson @relation(fields: [lessonId], references: [id]) lessonId String @unique transcript Bytes createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model CourseDetails { id String @id @default(cuid()) course Course @relation(fields: [courseId], references: [id]) courseId String @unique content Bytes createdAt DateTime @default(now()) updatedAt DateTime @updatedAt }</code></pre> <p>And I propse:</p> <pre><code class="language-prisma">model LessonContent { id String @id @default(cuid()) lesson Lesson @relation(fields: [lessonId], references: [id]) lessonId String @unique mdx Bytes type String //example "content" createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model LessonTranscript { id String @id @default(cuid()) lesson Lesson @relation(fields: [lessonId], references: [id]) lessonId String @unique mdx Bytes type String // example "transcript" createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model CourseDetails { id String @id @default(cuid()) course Course @relation(fields: [courseId], references: [id]) courseId String @unique mdx Bytes type String // example "details" createdAt DateTime @default(now()) updatedAt DateTime @updatedAt }</code></pre> <p>What do people think? Better to change the db or change the application? I'm leaning towards changing the db here, but I'd appreciate broader reflections on this kind of issue/pattern.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/ej-choi"><img src="https://avatars.githubusercontent.com/u/54039640?v=4" />ej-choi</a> commented <strong> 11 months ago</strong> </div> <div class="markdown-body"> <p>Is there a way for prisma to return the table name/type? The thing about a <code>type</code> property which is a <code>string</code> is that it would introduce a runtime vulnerability, as some shoddy line of code, or a database insert, could conceivably introduce a table entry with a typo in that property, so that when it is fetched the front-end has no idea what to do with it.</p> <p>To try to solve this runtime vulnerability, we would have to introduce a boilerplate check to every instance of that property being set. But this is only suboptimal in this context as it doesn't actually give compile-time safety, since it'd just be string comparisons which can only be evaluated at runtime.</p> <p>The full compile-time solution is to make <code>type</code> an Enum property in the model itself, but then we would have a triple repetition of information, in table name, <code>type</code>, and the enum enumerations of the possible types, which seems like too much redundancy. I think keeping the information content of the DB as low as possible is ideal.</p> <p>If <code>LessonContent</code>, <code>LessonTranscript</code> and <code>CourseDetail</code> could all be abstracted into an <code>MdxResource</code> or something, then a <code>type</code> property in this super-model would make more sense, as it introduces some new information. But, in the case of needing to have three different models (and it makes sense to have three different models as they do serve different roles), I think it's a redundancy that would introduce more complications at the resource creation level.</p> <p>But this is predicated on prisma providing some type checking capability.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/ej-choi"><img src="https://avatars.githubusercontent.com/u/54039640?v=4" />ej-choi</a> commented <strong> 11 months ago</strong> </div> <div class="markdown-body"> <p>Just to add to the above; in general, I fear database fields above application logic.</p> <p>An error in the database is unknown until a user that needs that data retrieves it, and it won't be available to a Dev using a development environment especially at later stages of development when there will probably be a distinction between customer data, production data and test data.</p> <p>An error in the application logic, however, can be spotted by a developer at any time in the development life cycle and can be fixed with a new commit for the dev-branch and new release for the customer.</p> <p>Code, to me, should be a malleable thing as long as use-cases are fulfilled (and their abuses, known) [I guess this is what TDD is, even though we aren't following that per se], and data as static and non-volatile as possible. And database fields which add redundancy is one of the ways in which maintaining this principle becomes more difficult.</p> <p>Of course, there is no strict distinction between errors in application logic and erroneous data, the former leads to the latter too. But that statement just seems like a negative reformulation of what coding is in general. In the movement between (logic) -> (data), the movement should be restricted enough to filter out the predictable errors, and one way of working towards that aim is to limit the number of modifiable fields and the possible variance between data sets/rows.</p> <p>I was thinking through this while writing, so I hope I haven't been too repetitive or trite in my points.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/Firgrep"><img src="https://avatars.githubusercontent.com/u/118931755?v=4" />Firgrep</a> commented <strong> 11 months ago</strong> </div> <div class="markdown-body"> <p>Thanks @ej-choi ! I didn't think of looking if Prisma had a model-checking, but making some quick searches now it seems that, apart from type-checking, this isn't possible. Type-checking also isn't ideal insofar as we might have two models with identical fields: if, for example, <code>LessonContent</code> and <code>LessonTranscript</code>, all have the same fields (yet their relation would be different, but that will be controlled on the db side), this may be picked up by TS but I suspect it will be enough to discriminate in runtime(?). </p> <p>This would work for non-identical fields.</p> <pre><code class="language-ts">if (typeof material === typeof prisma.lessonContent) { // do stuff }</code></pre> <p>But I am thinking of making the fields of <code>LessonContent</code> and <code>LessonTranscript</code> identical as that would bypass the need to check them insofar as retrieval for the MDX compiler is concerned. But to identify the models (for UI for example), I think we should add a new discriminatory field on each model.</p> <p>Your concern about runtime vulnerabilities and modifiable data is fair, which is why I think this would be controlled closest to the db with an <code>enum</code> defined in the schema (like what we have done with user <code>Role</code>), such that whenever a new entry is made of <code>LessonContent</code>, <code>LessonTranscript</code> or <code>CourseDetails</code>, a discriminatory field is defined as well at creation. There would be <em>no need to alter this field afterwards</em> and it would only be for reading purposes, so we don't need to worry about corrupt/unknown/tampered data here, since it should never be changed (same way the ID of a model is never changed after its creation). This should satisfy your concerns on runtime vulnerabilities and modifiable data? Otherwise please correct me if I missed something. </p> <p>We could conceivably bypass adding new fields by creating additional query to the db, but that seems expensive to me. If the question is adding one more field vs adding one (or more) trips to the db, I'd take adding one more field (provided we're not talking several MBs and more). </p> <p>So I'm leaning towards adding a new field on those "MDX models" that is an enum defined in the schema and also necessarily defined at entry creation, which could actually just be <code>model <ModelName></code>, and changing the <code>content</code>, <code>transcript</code> fields into <code>mdx</code>. What do you think?</p> <p>(To avoid confusing "type" with TypeScript types, we should avoid using the word "type" for anything but TS related stuff.)</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/Firgrep"><img src="https://avatars.githubusercontent.com/u/118931755?v=4" />Firgrep</a> commented <strong> 11 months ago</strong> </div> <div class="markdown-body"> <p>Also, while I agree with you on that code should be malleable--which I think you outlined very well--I don't think we need to be shy about using the db to our needs. When you say "keeping the information content of the DB as low as possible is ideal", I would say that it depends on the type of information more than the volume, namely, persistent information (the db is basically the memory of the app). However, totally agree with limiting the number of modifiable fields and the possible variance between data sets/rows. Less possible worlds, the better. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/ej-choi"><img src="https://avatars.githubusercontent.com/u/54039640?v=4" />ej-choi</a> commented <strong> 11 months ago</strong> </div> <div class="markdown-body"> <p>It's a shame prisma doesn't have model-checking, and I agree that type-checking by itself isn't very useful in this context. The enum method seems like the best solution. And yes, I agree that at this development stage the DB schema itself isn't set in stone so we shouldn't be afraid to change the data fields.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/Firgrep"><img src="https://avatars.githubusercontent.com/u/118931755?v=4" />Firgrep</a> commented <strong> 11 months ago</strong> </div> <div class="markdown-body"> <p>Yes, but one thing Prisma does nicely provide is to allow for default values (I had forgotten about this even though we're already using it for the User model!). Setting default values means we don't need any specific application logic and we can leave the entire thing to the db side. I was typing this out yesterday (not yet pushed to db) and I think we'll run with it: </p> <pre><code>enum MdxCategory { CONTENT TRANSCRIPT DETAILS } model CourseDetails { id String @id @default(cuid()) course Course @relation(fields: [courseId], references: [id]) courseId String @unique mdxCategory MdxCategory @default(DETAILS) mdx Bytes createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model LessonContent { id String @id @default(cuid()) lesson Lesson @relation(fields: [lessonId], references: [id]) lessonId String @unique mdxCategory MdxCategory @default(CONTENT) mdx Bytes createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model LessonTranscript { id String @id @default(cuid()) lesson Lesson @relation(fields: [lessonId], references: [id]) lessonId String @unique mdxCategory MdxCategory @default(TRANSCRIPT) mdx Bytes createdAt DateTime @default(now()) updatedAt DateTime @updatedAt }</code></pre> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>