softwareconstruction240 / autograder

Autograder for BYU's CS 240 Chess project
https://cs240.click
2 stars 2 forks source link

Backend: Identify Common Issues — SQL Name Case Sensitivity #363

Open frozenfrank opened 1 month ago

frozenfrank commented 1 month ago

This is a sub-issue of #266 that describes and defines one issue that can be solved independent of the others.

Overview

Detection Strategy

Scan the source code with regexes to identify the table & column names and verify used properly in other places.

If for any reason, the information cannot be extracted or compared with confidence, we'll skip this check and not provide results.

We may be able to extract some, but not all of the table or column names. After extracting at least some table names, we could then search all files for any instances of the word, and warn if any of them match with case insensitivity, but are not exactly the same case sensitively.

Implementation Details

Use grep to scan their files for the create table statements for the names of their tables. Then scan their files for places they are using the names of their tables. Compare these against each other respecting case and warn if they don't exactly match.

Identifying extracting both pieces of information from the codebase can be tricky since there are diverse ways to write code to generate the SQL statements. The basic forms of SQL are known, and many students will likely use them, but if student's employ techniques to improve the quality of their code, it could become arbitrarily difficulty to match all of them. Here are some things students may do that would interfere with our pattern matching:

Extracting table names

Table names could be extracted from basic SQL create statements having the following forms, where the capturing group captures the name of the table:

/(?:CREATE TABLE|INSERT INTO|REPLACE|UPDATE|DELETE FROM)\s+[`'\"]?(\w+)[`'\"]?/ig

Extracting column names

Extracting column names is a lot more difficult than the table names because they can appear in a lot more places and in a lot more ways. Therefore, we may want to split out the tests for column names into a separate issue than tests for table names.

Here are a few key facts,

pawlh commented 1 month ago

Don't mind me, just doing some autograder lurking.

This problem made me think of an article I read: https://alexravikovich.medium.com/quarantine-journey-writing-mysql-proxy-in-go-for-self-learning-part-1-tcp-proxy-39810479b7e9

This solution starts from scratch, but maybe some existings solutions are out there for Java

This is definitely an over-engineered solution, but if you proxied MySQL requests you wouldn't need to grep files at all. Instead, you could just intercept and inspect ALL SQL requests. You could save the table names from CREATE queries, then any subsequent FROM <tablename> would be expected to match a name from your set.

This would be able to handle any level of abstraction that students might include in their projects.

Over-engineered... but really really cool 😎

frozenfrank commented 1 month ago

@pawlh Great idea! After looking around, it seems that MySql supports this kind of feature officially with a proxy command. I found their docs here, but I haven't read them yet 😆

Additionally, it looks like it's possible to write MySQL plugins.

Also ChatGPT indicates that it would be pretty straightforward to implement our own JDBC driver that students would use in place of the default one. This would allow us to spy directly on the statements, and we could configure the starter code to use it automatically going forward. (Conversation.). This seems like the easiest path forward so far.

frozenfrank commented 2 weeks ago

@19mdavenport Did you see the MySQL proxy docs referenced here?

What do you think about the custom JDBC driver approach to this problem?

19mdavenport commented 2 weeks ago

I did. I initially wanted to work on an approach that wouldn't require code external to the autograder, but that may not be completely feasible. As far as custom driver is concerned, I looked into it and I have conflicting thoughts about it