Closed davesgonechina closed 2 years ago
Are you interested in the tool itself or the output of the Github Action?
Good question! Probably a Github Action so I can dump to a bucket or another repo and read the CSV as an external table or dbt seed.
A github action mostly makes sense when you want to automate a process though. To me it sounds like the ones interested in a CSV would load it up to a database so to further investigate/analyze the data, right?
Correct, basically a feedback loop where I run duplicate-code-detection-tool
on a repo full of SQL files to produce CSV output that becomes a table that the people writing all that SQL can then query in SQL to see how DRY or not their SQL codebase is as a whole.
I see. Then sounds like a feature request for a new argument to the python script to produce a CSV file. I'll take a look. 👍
I am looking into this and not sure how the csv
file should look like.
Considering the output is like this:
What would be the "columns" of the CSV file? Or would you expect that there's one csv
file generated for every source code file?
I can imagine two options:
csv
output for every source code filecsv
with three columns [Source code file] [Source code file to check against] [Similarity]
. (:warning: There will be a lot of duplicated information this way)I am not sure which one would be more usable and convenient though. :thinking:
That's a good question - I can see cases for both. In my use case, I guess what I ultimately need is a deduplicated #2, but I could easily DISTINCT dupes away since we would make it available as a SQL table.
I can go with that for now :) :+1:
@davesgonechina what do you think of #16?
When running the tool for the smartcar_shield project, I get an output.csv
file that looks like this:
File A,File B,Similarity src/car/smart/SmartCar.cpp,src/Smartcar.h,5.74 src/car/smart/SmartCar.cpp,src/sensors/distance/ultrasound/ping/SR04.cpp,0.39 src/car/smart/SmartCar.cpp,src/motor/digital/servo/ServoMotor.cpp,0.05 src/car/smart/SmartCar.cpp,src/car/distance/DistanceCar.cpp,17.91 src/car/smart/SmartCar.cpp,src/sensors/distance/infrared/analog/InfraredAnalogSensor.cpp,0.94 src/car/smart/SmartCar.cpp,src/sensors/heading/gyroscope/GY50.cpp,0.47 src/car/smart/SmartCar.cpp,src/car/heading/HeadingCar.cpp,57.91 src/car/smart/SmartCar.cpp,src/car/simple/SimpleCar.cpp,15.69 src/car/smart/SmartCar.cpp,src/runtime/arduino_runtime/ArduinoRuntime.cpp,0.08 src/car/smart/SmartCar.cpp,src/sensors/odometer/interrupt/DirectionalOdometer.cpp,1.15 src/car/smart/SmartCar.cpp,src/sensors/distance/infrared/analog/sharp/GP2Y0A02.cpp,1.08 src/car/smart/SmartCar.cpp,src/sensors/odometer/interrupt/DirectionlessOdometer.cpp,0.75 src/car/smart/SmartCar.cpp,src/control/differential/DifferentialControl.cpp,1.27 src/car/smart/SmartCar.cpp,src/sensors/distance/infrared/analog/sharp/GP2D120.cpp,1.07 src/car/smart/SmartCar.cpp,src/sensors/distance/ultrasound/i2c/SRF08.cpp,0.21 src/car/smart/SmartCar.cpp,src/sensors/distance/infrared/analog/sharp/GP2Y0A21.cpp,1.03 src/car/smart/SmartCar.cpp,src/control/ackerman/AckermanControl.cpp,0.18 src/car/smart/SmartCar.cpp,src/motor/analog/pwm/BrushedMotor.cpp,0.74 src/Smartcar.h,src/car/smart/SmartCar.cpp,5.74 src/Smartcar.h,src/sensors/distance/ultrasound/ping/SR04.cpp,4.58 src/Smartcar.h,src/motor/digital/servo/ServoMotor.cpp,2.2 src/Smartcar.h,src/car/distance/DistanceCar.cpp,13.05 src/Smartcar.h,src/sensors/distance/infrared/analog/InfraredAnalogSensor.cpp,1.32 src/Smartcar.h,src/sensors/heading/gyroscope/GY50.cpp,4.92
The command I ran was: python3 projects/duplicate-code-detection-tool/duplicate_code_detection.py -d src/ --project-root-dir projects/smartcar_shield --csv-output output.csv
Not bad! Thanks!
Thinking about being able to compare SQL used by data analysts and present similarity results in a familiar form e.g. a table.