tech-conferences / conference-data

Conference data for www.confs.tech
https://confs.tech
MIT License
261 stars 168 forks source link

Duplicate Conferences - Test should failed #7011

Closed JuanPabloDiaz closed 1 month ago

JuanPabloDiaz commented 1 month ago

The test didn't fail and mark as duplicate multiple conferences:

Was it because the URL and the name are not exactly the same in some cases?

JuanPabloDiaz commented 1 month ago

I fill out the form with a duplicate conference and the test did failed. it indeed flag it as

Error: [name] Found almost identical conference...

cgrail commented 1 month ago

Hi @JuanPabloDiaz

I've created a PR which is using string similarity to recognize duplicates. It looks promising: https://github.com/tech-conferences/conference-data/pull/7074

Best regards, Christian

cgrail commented 1 month ago

Ok. Found a solution which worked for your test cases: https://github.com/tech-conferences/conference-data/actions/runs/10147310567/job/28057389678 https://github.com/tech-conferences/conference-data/actions/runs/10147207581/job/28057047257

I'm using string compare for the URL and for the URL path and conference name. Got it working only in combination: https://github.com/tech-conferences/conference-data/blob/896ca1c1ed0f8f6f479fd004e26f57598be402f6/scripts/utils/mergedConferencesReader.js#L40-L46

cgrail commented 1 month ago

Now also fixed the EuroStar case: https://github.com/tech-conferences/conference-data/actions/runs/10148014597/job/28059762426?pr=7007