Closed MBeliou closed 3 years ago
Sounds good to me.
To make sure spelling mistakes do not cause two posts for the same project to end up with two different ID's, the crawler will also a compute a string distance on the title and dev. If the distance is small enough, then it will assume the post belongs to an existing project.
What about multiple devs working on the same project? probably not common but also not that rare (see: this project). If you identify by dev/project every dev on the same project is going to be seen as a new project which isn't right.
Seems like project name could be enough as a unique identifier? there's not going to be that many projects, and it's not a big ask to give it a unique-ish name.
Seems like project name could be enough as a unique identifier?
Yeah. I agree. In the beginning I was concerned that spelling mistakes might cause posts to match the project, but with some string/fuzzy distance it should not be an issue. I will use the title as the unique ID to match posts to a project.
Creating a url-safe identifier should remove most ambiguity (lower-case, hyphenated, etc). I wouldn't worry about hamming distance or anything fancy yet, easy to add later it presents as a real problem.
Part of the template copy pasted from the wdg.one website:
:: my-project-title :: dev:: anon
I believe we could make an unique identifier for every project based on a combination of the title and the dev like so: my-project-title@anon. We could also easily normalize the title and dev name themselves while parsing, doing so would most likely let us be more lax for the users.A few examples: :: My Flutter Shill Script :: dev:: flutterShill Turns into -> my-flutter-shill-script@fluttershill
:: wdg.one website :: dev:: Anon turns into -> wdg.one-website@anon