lac-dcc / chimera

A tool for synthesizing Verilog programs
GNU General Public License v3.0
18 stars 1 forks source link

duplicate files with alternate casing #1

Closed seanjensengrey closed 1 week ago

seanjensengrey commented 1 week ago

There are some duplicate files with alternate casing that collide on case insensitive file systems (default on OSX).

It would be nice if OSX folks and Linux folks saw the same dataset.

warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'database/Caravel_user_project_verilog_dv_io_ports_io_ports_tb.v'
  'database/caravel_user_project_verilog_dv_io_ports_io_ports_tb.v'
  'database/Caravel_user_project_verilog_dv_la_test1_la_test1_tb.v'
  'database/caravel_user_project_verilog_dv_la_test1_la_test1_tb.v'
  'database/Caravel_user_project_verilog_dv_la_test2_la_test2_tb.v'
  'database/caravel_user_project_verilog_dv_la_test2_la_test2_tb.v'
  'database/Caravel_user_project_verilog_dv_mprj_stimulus_mprj_stimulus_tb.v'
  'database/caravel_user_project_verilog_dv_mprj_stimulus_mprj_stimulus_tb.v'
  'database/Caravel_user_project_verilog_dv_wb_port_wb_port_tb.v'
  'database/caravel_user_project_verilog_dv_wb_port_wb_port_tb.v'
  'database/Caravel_user_project_verilog_gl_user_proj_example.v'
  'database/caravel_user_project_verilog_gl_user_proj_example.v'
  'database/Caravel_user_project_verilog_gl_user_project_wrapper.v'
  'database/caravel_user_project_verilog_gl_user_project_wrapper.v'
  'database/Caravel_user_project_verilog_rtl_defines.v'
  'database/caravel_user_project_verilog_rtl_defines.v'
  'database/Caravel_user_project_verilog_rtl_user_defines.v'
  'database/caravel_user_project_verilog_rtl_user_defines.v'
  'database/CHISEL3-PROJECTS_Chapter_00_BootCamp_BC_module_example_PassThroughGenerator.v'
  'database/CHISEL3-PROJECTS_Chapter_00_BootCamp_BC_module_example_PassthroughGenerator.v'
  'database/FlattenRTL_tests_regression_rocket_Rocket.v'
  'database/FlattenRTL_tests_regression_rocket_rocket.v'
  'database/invalid_programs/Caravel_user_project_verilog_rtl_uprj_netlists.v'
  'database/invalid_programs/caravel_user_project_verilog_rtl_uprj_netlists.v'
  'database/invalid_programs/Caravel_user_project_verilog_rtl_user_proj_example.v'
  'database/invalid_programs/caravel_user_project_verilog_rtl_user_proj_example.v'
  'database/invalid_programs/Verilog_codes_AddersandSubtractors_Full_adder_full_adder.v'
  'database/invalid_programs/Verilog_codes_AddersandSubtractors_full_adder_full_adder.v'
  'database/RISC-V_ALU.v'
  'database/RISC-V_alu.v'
  'database/tinytapeout-06-staging_projects_tt_um_ALU_tt_um_ALU.v'
  'database/tinytapeout-06-staging_projects_tt_um_alu_tt_um_alu.v'
  'database/tinytapeout-06_projects_tt_um_ALU_tt_um_ALU.v'
  'database/tinytapeout-06_projects_tt_um_alu_tt_um_alu.v'
rafasumi commented 1 week ago

That's a good point! I hadn't thought of that when I first wrote the script to mine programs from GitHub.

I think that adding a short hash to end of the file name should be enough to avoid these collisions. Another option is adding the timestamp of when the program was mined, but that could be too long.

I won't have the time to work on it right now, but I'll try to get to it soon. If anyone's interested, the mining script would be a good starting point.